I just read a story on Neowin.net on Microsoft’s recent online service outages. On March 12th, a few of Microsoft’s services went down – Outlook.com, SkyDrive, Hotmail and Calendar. The cause was a rapid temperature spike which was the result of a firmware update failure. Was this failure preventable? Yes, with the proper monitoring and reporting, it could have been prevented. Here is how:
When patching updates, you should report on whether the update was successful or not. This can be done with patch management tools that report on success or failure of your patch deployment.
Monitor server hardware. With the right server monitoring tool, it’s quite easy to monitor hardware across your server vendors. Just yesterday I was speaking with a customer who just installed blade chassis monitoring and immediately discovered that a power supply was about to fail. Without monitoring, you have no visibility into temperature spikes, hard drive issues, power supply problems or memory status.
Checkout this on-line demo to explore how these tools work.