Despite the relatively maturity of monitoring and systems management as a discrete IT discipline, I am asked - year after year and job after job - to give an overview of what monitoring is.
This document is our (SolarWinds') attempt to address that question in a more structured form.
Updated from the original post 2 years ago (Monitoring 101) and intended as guide to help bring new team members (often fresh out of college or a technical program) up to speed with monitoring concepts quickly, this document (or portions of it) can serve as a good introduction for a variety of audiences.
"If you have worked in the IT field for more than 15 minutes, the situation described above is neither unique nor rare, even if it is somewhat colorful. Systems crash unexpectedly, users make bizarre claims about how “the internet is slow”, and managers ask for historical statistics that leave you scratching your head wondering how to collect in a way that is meaningful and doesn’t consign you to the hell of hitting “refresh” and writing down numbers on a paper for half a day, just to get a baseline for a report.
The answer to all these challenges lies in effectively monitoring your environment – collecting statistics and/or checking for error conditions so that you can act or report effectively when needed."
This is also available as a Kindle ebook. Click here to find it on Amazon.