In NPM 9.1 all I needed to worry about when it came to Orion performance was the polling completion percentage and basic cpu/memory types of metrics on the polling servers themselves. If polling completion % was above 99% I could sleep comfortably knowing that polling was going fairly well and that the database was recording the data I wanted it to record. After upgrading to 10.1 I have to worry about the SNMP polling index as well, fine, no problem. Everything looks good to me, have gotten my CPU resources under control and average around 30%, polling completion percentage is higher than ever, and the SNMP polling index stays "complete" 99% of the time. When I started losing data and having services fail I've been told some of it's likely performance related, because diagnostics show my MSMQ queues are filling up and the collector data processor can't keep up. So now I've added APM monitors with all the MSMQ service and perfmon data. So my question is, what are the recommended best practices for monitoring complete Orion performance? I don't want any more surprises, so what else do I need to monitor, and what are the Solarwinds recommended thresholds?
What does the rest of the user community do to monitor their Orion performance/health?
hmm! I was hoping somebody else would respond to this, because I am curious too. I hadn't heard about the MSMQ queues, but I had heard to keep an eye on the average disk queue length of the SQL server. I hope other have more ideas.