Consequences: Power Outages and their effects
April 27th, 2009 by herloThis Sunday I was given a call by our VPS provider. There was a power outage at the colo where the virtual machine was housed. I still have a few questions about why it wasn’t on a UPS (or if it was) and why it took so long to get power restored. I guess I’ll get to those soon enough, but it caused some unintended outages and is probably a good test for all the system stuff we have running.
Specifically, the Utah Open Source Planet failed to update for two days, which, if you are following our feed, is probably why you just saw a bunch of posts come through. It turns out the software I use, planet, was working fine. However, it appeared to be the problem due to the misleading errors it was getting. The error code was:
ERROR:planet:Error 500 while updating feed <url://to.planet/rss.feed>
This error would appear for each of the entries we have in our planet. However, the error was not the planet’s fault. At first, I grabbed one of the url’s and tried it in my browser, no problem. The rss feed would load up just fine. After a few minutes of digging, I started researching the network. What I discovered was that I wasn’t able to resolve any hostname.
Because we run our own dns servers, we also point our machines at those servers. However, when the power went out on Sunday, it appears we didn’t have the dns server set to automatically start on boot. A quick chkconfig command to fix that and we’re back up. Thus, the consequence in this case was lack of planet updates for a while.
However, I feel we were lucky (and good). For the most part, things worked, that’s a good sign, but I think this goes back to one key principle in a system administrator’s repertoire. Make sure to test your systems to validate that they come back up correctly. Lucky for us, this was a minor service, but a service nonetheless. I’m glad that everything else worked and it’s been a good testing day for that reason.
Cheers,
Clint