In a previous post, I outlined a couple of steps you should take to ensure that you are best protected against "apocalyptic disaster" on your network. In short:
When you want to monitor network performance or uptime on your network, the depth and quality of your analysis can never be any better than the continuity of your data. An NMS that isn't "always up" isn't going to provide you with anything approximating continuous data. A high-availability NMS provides the continuous data you really need. To put it in another context, security cameras are only of much use if they're on and recording. Security guards are only good if they're present and awake. You don't even get the dummy effect working in your favor if your NMS isn't up and monitoring.
So you need to have your NMS up, polling, and writing to your database to get any real benefit out of it. The more it is up and the less it is down, the better.
It's inevitable that you're going to need to take your NMS server down for maintenance, from time-to-time, and, as we've been discussing, it's even possible that it might "fall down" all on its own. These are cases when you really need a failover solution. A good failover solution effectively provides a backup NMS to fill in when your primary NMS can't get the job done. Ideally, the transition is seamless, from primary NMS server to backup NMS server: your primary NMS server goes down and your backup NMS server picks up polling and writing. Hopefully, this happens instantaneously, so you don't miss anything.
Failover protection, is, of course, a bit more complicated in the technical details, but you can find more detailed information about it in the section, "Orion Failover Engine Concepts" of the SolarWinds Failover Engine Administrator Guide.
In a future post, I'll talk about what a good disaster recovery solution can do for you after an apocalyptic network disaster.