Outages, Monitoring and Being Prepared

Posted on 22 April 2011

Business Service Management Commentary on IT Service Management, Service Level Management & Performance Management

Last week, Lori MacVittie had a blog post on DevCentral about earning your data center merit badge. The message was delivered up front and it was simple to understand: Be prepared. MacVittie is right of course, the best way to stay out of trouble is to put systems in place to prevent it from happening in the first place.

But today’s outage at Amazon EC2 showed us something else — that no matter how well prepared you are, stuff happens that’s totally out of your control and it can spiral out of your control pretty quickly. Lest you think just because you don’t use public Cloud Infrastructure as a Service (IaaS) like Amazon EC2 and hence have nothing to worry about, think again.

If it can happen to Amazon, it can happen to you because at its heart what is Amazon but a giant data center, whose core business is keeping other businesses going. That would suggest that Thursday’s outage was something extraordinary to bypass all of the fail-safes that a system like Amazon has to have in place to keep things going. Today it all fell apart, and it could just as easily happen to you because chances are, your data center doesn’t have nearly the number of contingencies in place that Amazon has.

That means that ultimately you’re probably closer to a disaster like yesterday morning than Amazon ever was (yet it happened anyway).

All of this is not to scare you because IT pros know the score about these things, but it is to remind you that having systems in place to monitor and alert you *before* that disaster strikes is more important than ever. Now, it may end up that it doesn’t matter how prepared you are if a disaster strikes that’s completely beyond the scope of anything you could possibly have imagined in a reasonable contingency plan.

All you can do is follow MacVittie’s simple advice and be prepared for whatever comes. It might not always be enough, but if you do your best, you’ll minimize those major outages and be ready to deal with them when they do happen. But remember disasters happen to everyone at some point, whether your in the cloud or in-house in a data center, and you need to be ready.

Photo by rbrwr on Flickr. Used under Creative Commons License.

Website Pin Facebook Twitter Myspace Friendfeed Technorati del.icio.us Digg Google StumbleUpon Premium Responsive

This post was written by:

- who has written 36 posts on Business Service Management Hub.


Contact the author