Posted on | August 4, 2008 | 1 Comment
It can sometimes be difficult to explain abstract concepts to people who don’t have programming backgrounds. I was working on a really simple single system image cluster for someone, all that we had left to complete was the cluster announce mechanism and heartbeat configuration.
The design called for a completely automated / self-recovering setup. Even supervisory nodes were supervising each-other. If one controller went down, the other would reboot it. I found an instance where both controllers would be caught in a perpetual re-boot cycle.
I tried explaining “This is really racy … “, the tech that I was working with did not get it. So I explained again “The way these things join after recovery presents an interesting race condition”, still a blank stare.
Finally I asked “Would you take a laxative and a sleeping pill at the same time?” , he immediately understood. A few small patches later, it worked just fine.