As a man with no prior knowledge/experience with monitoring solutions (we are using solarwinds NPM 11.5). I have finally gotten somewhere and am now at a stage where I can start to tweak the thresholds of the environment.
I have got so far:
- All SQL servers memory and CPU have a very high threshold (97% and 99% usage) as SQL is resource hungry.
- I have run a report for all servers that gives me a huge amount of resources, plus, from the notifications that have been sent to the hunt mailbox, I have got a reasonable picture of that thresholds I would like.
My question is thus: for all of my servers, would I want to use a high usage CPU threshold (for those 'odd' times that a site is doing work/patching/other)
Or, do I set them to something more useable (eg - if A CPU had a yearly peak of 5%, I would set the critical level to 20 - 30%)
Or, is it viable to do this and leave the thresholds as is.
Any help from my confused, almost about to explode mind would be appreciated.