Quis custodiet ipsos custodes? I personally love this Latin phrase and it translates to Who will guard the guards themselves? It's most often used when discussing government or political systems to ensure power is not used improperly. But in my case I liken it to monitoring. SolarWinds in my environment is the manager of managers every system flows through SW in one manner or another. So how do I monitor SW effectively.
I've had a few threads recently about issues I've had with Solarwinds not polling for one reason or another. I understand that things like this can happen, heck it's the whole reason we monitor computer systems to begin with things break. What I don't understand is why there are not built in mechanisms to alert us when things go wrong. Sure you can monitor services but in my cases the services have been operational but the system still wasn't collecting data. There are some alerts that do this in a limited capacity like the Network Engineer Polling Check (Nodes, Volumes, Interfaces) but I can't exactly enable email notification of these and have 4000+ emails go out when an issue like this arises. These are also incomplete they only monitor portions of NPM there are no built in checks for SAM or SEUM to tell us when a monitor isn't collecting data.
I've had some great help from users like in writing some custom SQL alerts to notify when 50% or more of an engines nodes fail but I would think this sort of alerting mechanisms would be built in. The SolarWinds suite keeps growing and adding more functionality. We keep adding more and more modules to our environment and introducing more failure points. At some point the system either needs to be 100% rock solid or at bare minimum it needs to alert the admins when it has a problem. There should be some semblance of checks/balances.
Ok I'm stepping down off of my soap box for now.