On Day Zero of being a DBA I inherited a homegrown monitoring system. It didn't do much, but it did what was needed. Over time we modified it to suit our needs. Eventually we got to the point where we integrated with OpsMgr to automate the collection and deployment of monitoring alerts and code to our database servers. It was awesome.
The experience of building and maintaining my own homegrown system combined with working for software vendors has taught me that every successful monitoring platform needs to have five essential components; identify, collect, share, integrate, and govern. Let's break down what each of those mean.
A necessary first step is to identify the data and metrics you want to monitor and alert upon. I would start this process by looking at a metric and putting it into one of two classes: informational or actionable. Metrics that were classified as information were the metrics that I wanted to track, but didn't need to be alerted upon. Actionable are the metrics where I needed to be alerted upon because I was needed to perform some actions in response. For more details on how to identify what metrics you want to collect, check out the Monitoring 101 guide, and look for the Monitoring 201 guide coming soon.
After you identify the metrics you want, you need to decide how you want to collect and store them for later use. This is where flexibility becomes important. Your collection mechanism needs to be able to consume data in varying formats. If you build a system the relies on data being in a perfect state, you will find yourself easily frustrated the first time some imperfect data is loaded. You will also find yourself spending far too much time playing the role of data janitor.
Now that your data is being collected, you will want to share it with others, especially when you want to help provide some details about specific issues and incidents. As much as you might love the look of raw data and decimal points, chances are that other people will want to see something prettier. And there's a good chance they will want to be able to export data in a variety of formats, too. More than 80% of the time your end-users will be fine with the ability to export to CSV format.
With your system humming along, collecting data, you are going to find that other groups will want that data. It could also be the case that you need to feed your data into other monitoring systems. Designing a system that can integrate well with other systems requires a lot of flexibility. It's best that you think about this now, before you build anything, as opposed to trying to make round pegs fit in a square hole later. And it doesn't have to be perfect for every possible case, just focus on the major integration method used the world over that I already mentioned: CSV.
This is the component that is overlooked most often. Once a system is up and running, very few people consider the task of data governance. It's important that you take the time to define what the metrics are and where they came from. Anyone consuming your data will need this information, as well. And if you change a collection, you need to communicate the changes and the possible impacts they may have for anyone downstream.
When you put those five components together you have the foundation for a solid monitoring application. I'd even surmise that these five components would serve any application well, regardless of purpose.