Posted on | March 23, 2009 | 1 Comment
No, not Linus’ (though I do wonder about that guy sometimes). I’m talking about the Linux OOM (out of memory) killer which I am highly considering classifying as sentient.
I’m finishing up some Xen related programs, one of them is a special watchdog daemon that runs on Xen guests and exports system vitals via Xenbus. I’m building a small set of rules that allows the watchdog to realize what is, or is not a recoverable event when it decides if sending the watchdog ping back to softdog is a good idea.
Naturally, one would want to know a tally of how many victims (I mean processes) the OOM killer has claimed. Of course, no such statistics are readily available via the /proc interface (that I can find, anyway). This is the second time today that I’ve encountered this psychotic pest. The first time was doing a live migration test to see if I could reproduce a friend’s bug … migrating a domain that has more RAM than dom-0 sometimes results in dom-0′s OOM killer sending xend to a watery grave.
As for the watchdog, I can work around the lack of any centralized statistics for OOM events, its quite cheap to just iterate through the process list and collect scores while figuring out how many victims are likely to be ‘next’. But still, it would be wonderful if we knew how many victims there were (readily via /proc)
I may make a patch that adds oom_count: to the bottom of /proc/stat, but man, I really hate touching the kernel for such a simple need, nevermind portability. I also don’t want to parse system messages once every 45 seconds just to count oom events.
So, to summarize, Linux’s inner child (named OOM) is not only a very efficient killer, its very good at concealing its body count
One of the most fun aspects of writing software to run on a privileged Xen domain is that you must be extra defensive, the privileged domain typically has less memory than old commodity Pentium Pro desktops. Yet, some things designed to run on the privileged domain allocate memory as if they were a relational database server. So, perhaps this lunatic killer should be elusive, it keeps everyone else in check