cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Top 5 Best Practices for Systems Monitoring

Level 11

Systems monitoring has become a very important piece of the infrastructure puzzle. There might not be a more important part of your overall design than having a good systems monitoring practice in place. There are good options for cloud hosted infrastructures, on-premises, and hybrid designs. Whatever situation you are in, it is important that you choose a systems monitoring tool that works best for your organization and delivers the metrics that are crucial to its success. When the decision has been made and the systems monitoring tool(s) have been implemented, it’s time to look at the best practices involved in ensuring the tool works to deliver all it is expected to for the most return on investment.

The term “best practice” has known to be overused by slick salespeople the world over; however, there is a place for it in the discussion of monitoring tools. The last thing anyone wants to do it purchase a monitoring tool and install it just for it to slowly die and become shelfware. So, let’s look at what I consider to be the top 5 best practices for systems monitoring. 

1. Prediction and Prevention              

We’ve all heard the adage that “an ounce of prevention is worth a pound of cure.”  Is your systems monitoring tool delivering metrics that help point out where things might go wrong in the near future? Are you over-taxing your CPU? Running out of memory? Are there networking bottlenecks that need to be addressed? A good monitoring tool will include a prediction engine that will alert you to issues before they become catastrophic. 

2. Customize and Streamline Monitoring        

As an administrator, when tasked with implementing systems monitoring, it can bring lots of anxiety and visions of endless, seemingly useless emails filling up your inbox. It doesn’t have to be that way. The admin needs to triage what will trigger an email alert and customize the reporting accordingly. Along with email alerts, most tools allow you to create custom dashboards to monitor what is most important to your organization. Without a level of customization involved, systems monitoring can quickly become an annoying, confusing mess.

3. Include Automation

Automation can be a very powerful tool, and can save the administrator a ton of time. In short, automation makes life better, so long as it’s implemented correctly. Many tools today have an automation feature where you can either create your own automation scripts or choose from a list of common, out-of-the-box automation scripts. This best practice goes along with the first one in this list, prediction and prevention. When the tool notices that a certain VM is running out of space, it will reach back to vCenter and add more memory before it’s too late, assuming it has been configured to do so. This makes life much easier, but proceed with caution, as you don’t want your monitoring tool doing too much. It’s easy to be overly aggressive with automation. 

4. Documentation Saves the Day

Document, document, document everything you do with your systems monitoring tool. The last thing you want is to have an alert come up and the night shift guy on your operations team not know what to do with it. “Ah, I’ll just acknowledge the alarm and reset it to green, I don’t even know what IOPS are anyways.” Yikes! If you have a “run book” or manual that outlines everything about the tool, where to look for alerts, who to call, how to log in, and so on, then you can relax and know that if something goes wrong, you can rely on the guy with the manual to know what to do. Ensure that you also track changes to the document because you want to monitor what changes are being made and check that they are legit, approved changes.

5. Choose Wisely

Last, but definitely not least, pick the right tool for the job. If you migrated your entire workload to the cloud, don’t mess around with an on-premises solution for systems monitoring. Just let the cloud provider use their proprietary tool and run with it. That being said, get educated on their tool and make sure you can customize it to your liking. Don’t pick a tool based on price alone. Shop around and focus on the options and customization you can do with the tool. Always choose a tool that achieves your organization's goals in systems monitoring. The latest isn’t always the greatest.

Putting monitoring best practices in place is a smart way to approach a plan to help ensure your tool of choice is going to perform its best and give you the metrics you need to feel good about what’s going on in your data center.

26 Comments
Level 14

Good article and it all makes sense.  However, how many of us actually get any choice in what tool we use.  I started my current role a year ago and found a half installed (and half baked) install of Solarwinds SAM.  There was no handover as the previous incumbent had left and there was no documentation.  I was just told to get it sorted in any spare time I had between regular third line duties.  I've now got it monitoring the correct servers, reporting correctly and alerting only when really necessary (and only alerting the correct people).  I don't automate memory or CPU increases as we are already dangerously over provisioned on our VMWare systems.  I do get all the trend analysis and will occasionally increase memory, CPU and disk space.  I quite often get advance warnings of network issues and surprise our network team who don't know about upcoming issues (their tool doesn't give them this and they won't use NPM).  I've got custom dashboards and am looking to create more.  And then they as me what I do all day      .

Level 13

Good Advice.

Level 20

Documentation does save the say.  Sometimes it's hard for me to slow down enough to document things but when I do it often comes in handy later.

I love #5:  Choose Wisely. 

If you remember one of the Indiana Jones movies:

pastedImage_3.png

pastedImage_0.png

Level 11

I find that your experience is par for the course when it comes to monitoring... documentation is key, especially with positions that have a lot of turnover!

Level 11

Thanks david.botfield

Level 11

Yes, documentation, at least good documentation does take a lot of your time.  However, if its done right the first time, you only need to make minor updates as the project grows. 

Level 11

Yep, I was actually thinking of Indiana Jones and the Last Crusade when I wrote #5.... "that's not the cup of a carpenter..."

Level 10

Nice article. Automation is the one that is the most challenging of all, IMHO.

Level 15

Good points in the write-up.  I have found that documentation and automation to be the challenges.  Human nature is to complete the task and move on.  But remembering and taking the time to document is a sign of maturity.  Automation is great but I have found that too many administrators and upper management are scared of it.  They want a real human to make decisions and not trust the machine to self-correct.

Thanks! A great article....

There could be song title or a band name in there:

"The Cup of a Carpenter" might make an inspirational tune for some church folks.

"Carpenter's Cup" could be the band that sang it.

I've always been a bit skeptical about why a carpenter's son's blood would be collected in a cup that belongs to his family.  Who brings family dishes to a crucifixion?

Some folks are weird . . .

And I'm probably one of them, from the point of view of other cultures.

Level 16

"it is important that you choose a systems monitoring tool that works best for your organization and delivers the metrics that are crucial to its success"

I have used quite a few of the monitoring tools available and have found the Solarwinds suite quite capable and very enjoyable to use.

Level 14

All this talk about documentation is on point, and a point of frustration from my perspective.  At my work it's been a tradition to say "documentation?  what is that?".  I'm trying to make positive changes though!  Good write up. 

Level 11

Carpenter's Cup is an amazing band name... if someone hits it big with that name, I'm going to sue for royalties!

Level 11

I would disagree with you here, there's always someone on the team that is good with scripting and can jump in and write a good script for what you need.  I think documentation is the hardest and the worst!  It can be very tedious work, especially where there's lots of IT governance.

Level 11

Yep, all it takes is one time for something to go wrong and you have no documentation for you to realize the importance of documentation.  Good points. 

Level 11

Good, I'm glad you found something that works for you... that's half the battle!

Level 11

Thanks for reading smttysmth02gt​, and thanks for the kind words.  I expected a lot of comments on documentation, it seems to be the bane of any IT pro's existence. 

Level 14

Reminds me of a recent incident here where some servers became unavailable due to an obscure issue and the documentation on how to get around this issue was only on one of these servers (a virtual one and we couldn't access the console).  Ended up restoring the files from backup to get the details of how to fix this issue.  All the eggs in one basket.

Level 11

Oh man, that's the worst... can't say I haven't experienced that before, make copies of your documentation and put it in several places. 

Level 14

Yes,

     All done before I got here.  I'm in the process of showing them how to do things properly but there is sooooo much stuff done badly that I may be retired before I sort it all out (or behind bars for killing someone who is too stupid to understand even basic common sense).

MVP
MVP

Nice write up

Automation is one of the more difficult parts in IT and very fragile without proper change control.

RT

Level 11

Thanks vinay.by​!

Level 11

Couldn't agree more, change controls are very necessary but come become a real burden for administrators.  Do you have a lot of change control in place? 

I still quote the knight using the same diction. Maybe 1 out of every 10 times somebody gets the reference. Alas... poor me.