Systems monitoring has become a very important piece of the infrastructure puzzle. There might not be a more important part of your overall design than having a good systems monitoring practice in place. There are good options for cloud hosted infrastructures, on-premises, and hybrid designs. Whatever situation you are in, it is important that you choose a systems monitoring tool that works best for your organization and delivers the metrics that are crucial to its success. When the decision has been made and the systems monitoring tool(s) have been implemented, it’s time to look at the best practices involved in ensuring the tool works to deliver all it is expected to for the most return on investment.
The term “best practice” has known to be overused by slick salespeople the world over; however, there is a place for it in the discussion of monitoring tools. The last thing anyone wants to do it purchase a monitoring tool and install it just for it to slowly die and become shelfware. So, let’s look at what I consider to be the top 5 best practices for systems monitoring.
1. Prediction and Prevention
We’ve all heard the adage that “an ounce of prevention is worth a pound of cure.” Is your systems monitoring tool delivering metrics that help point out where things might go wrong in the near future? Are you over-taxing your CPU? Running out of memory? Are there networking bottlenecks that need to be addressed? A good monitoring tool will include a prediction engine that will alert you to issues before they become catastrophic.
2. Customize and Streamline Monitoring
As an administrator, when tasked with implementing systems monitoring, it can bring lots of anxiety and visions of endless, seemingly useless emails filling up your inbox. It doesn’t have to be that way. The admin needs to triage what will trigger an email alert and customize the reporting accordingly. Along with email alerts, most tools allow you to create custom dashboards to monitor what is most important to your organization. Without a level of customization involved, systems monitoring can quickly become an annoying, confusing mess.
3. Include Automation
Automation can be a very powerful tool, and can save the administrator a ton of time. In short, automation makes life better, so long as it’s implemented correctly. Many tools today have an automation feature where you can either create your own automation scripts or choose from a list of common, out-of-the-box automation scripts. This best practice goes along with the first one in this list, prediction and prevention. When the tool notices that a certain VM is running out of space, it will reach back to vCenter and add more memory before it’s too late, assuming it has been configured to do so. This makes life much easier, but proceed with caution, as you don’t want your monitoring tool doing too much. It’s easy to be overly aggressive with automation.
4. Documentation Saves the Day
Document, document, document everything you do with your systems monitoring tool. The last thing you want is to have an alert come up and the night shift guy on your operations team not know what to do with it. “Ah, I’ll just acknowledge the alarm and reset it to green, I don’t even know what IOPS are anyways.” Yikes! If you have a “run book” or manual that outlines everything about the tool, where to look for alerts, who to call, how to log in, and so on, then you can relax and know that if something goes wrong, you can rely on the guy with the manual to know what to do. Ensure that you also track changes to the document because you want to monitor what changes are being made and check that they are legit, approved changes.
5. Choose Wisely
Last, but definitely not least, pick the right tool for the job. If you migrated your entire workload to the cloud, don’t mess around with an on-premises solution for systems monitoring. Just let the cloud provider use their proprietary tool and run with it. That being said, get educated on their tool and make sure you can customize it to your liking. Don’t pick a tool based on price alone. Shop around and focus on the options and customization you can do with the tool. Always choose a tool that achieves your organization's goals in systems monitoring. The latest isn’t always the greatest.
Putting monitoring best practices in place is a smart way to approach a plan to help ensure your tool of choice is going to perform its best and give you the metrics you need to feel good about what’s going on in your data center.