I was doing a dress down with a new Sysadmin of one of our main Virtual Clusters and Storage Racks at a large financial institution. I was explaining the network conections to the hosts, and the new SysAdmin bumped a fuse on one of the two Power Strips.
I heard all the beeping and quickly hit the wrong power strip causing the entire cabinet to go dark, bringing down hundreds of virtual servers including our core financial system. This could have been a career limiting move, but due to some quick thinking and our super human SysAdmin powers we went to work bringing up the NetApp and several Dell hosts. Within 10 minutes everything was back up and running again. During that time we only lost 1 financial transaction, which was quickly recovered. We later documented the outage as a BCP test, and documented how the whole team stepped up to minimize downtime and screen all the phone calls generated by this short lived event.
The biggest lesson learned is to stay calm and don't assume anything no matter how bad it is. Had I just stopped and observed a few minutes rather than shooting from the hip, I could have avoided the entire situation. My CIO at the time mentioned that it was great On the Job Training for the new SysAdmin.
No servers or data were harmed in the making of this life changing story
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community.
More than 150,000 members are here to solve problems, share technology and best practices, and directly
contribute to our product development process.
Learn more today by joining now.