It seems to me that these individual service/process CPU checks are a waste of time, or am I missing something?
There's actually an important distinction between Service CPU use and System CPU use, and you definitely want to monitor both, and reactions to alerts will be different depending on the object monitored.
It's quite possible by installing many services/applications on a server to max out the System CPU use, and yet any individual Service CPU use may still only be in very low numbers. Encountering high System CPU usage is typically resolved by:
- Adding more cores
- Moving services/applications off to another server
But a high Service CPU use is an entirely different issue altogether. While there are certainly cases where high Service CPU might be predictable.... consider a SQL Server performing a large long-running query... but other cases where a Service CPU might likely be a symptom of a problem (consider a DNS or DHCP service running at 50%+
Whether the measurement is actually against a single core likely depends on whether the particular service/process is multi-threaded or not. A single-threaded service/process is only going to use one core, no matter how many are available.
I can see what you're saying, but I am still not convinced, The service monitors as supplied with the Exchange templates have critical thresholds set at 90% CPU,. So, if service X is using 90% CPU, an Exchange service X CPU usage alert would trigger. But so would my system CPU alert. There is no point having 2 alerts fo the same event (at least not in this case). So, unless the service CPU check is measuring something different, for example high usage of a single CPU or core I am tempted to remove the service CPU checks.
1 of 1 people found this helpful
Your argument is quite valid for single-service/single-application servers.... except that even single-service/single application servers are not actually SINGLE-service/SINGLE-application servers!
So here's the question:
- What level of CPU utilization is acceptable for the InformationStore service?
- What level of CPU utilization is acceptable for any of the other individual services that runs on a MAILBOX server?
- What level of TOTAL CPU utilization is acceptable for a Mailbox server (given that the InformationStore service is NOT the only service/process running on that server)?
Personally, my best suggestion is this:
- Make use of DYNAMIC BASELINES. Find out what the *actual* per-service CPU utilization and per-SYSTEM CPU utilization of that server is.
- Set the baselines based on actual usage, and set the alerts based on the standard deviations from those baselines.
As a result, two things will happen:
- If any SERVICE/PROCESS deviates from the NORMAL behavior of that system you'll get an alert.
- If the SYSTEM deviates from the NORMAL behavior of that system you'll get an alert.
Are any of those two scenarios undesirable?
In reality, rarely will it occur that BOTH alerts trigger simultaneously, but if they do.... well... the system is probably at 100% CPU utilization for a notable period of time and you should get alerted to kingdom come in that sceanrio. :-)
And, to your other point. Quite a few service/processes are, in fact, still single-threaded, so maxing out that service/process on a single CORE may, in fact, only hit a small percentage of the actual SYSTEM CPU resources available.
Consider a single-threaded service/processes on a dual-core/single-socket system. In this scenario 90% of service CPU utilization is only 45% of system CPU utilization.
Consider a single-threaded service/process on a quad-core/single-socket system or a dual-core/dual-socket system. In this scenario 90% of service CPU utilization is only 22.5% (less than a quarter) of system CPU utilization.
Ergo, to properly make this determination requires one to evaluate the threaded-state of every service/process running on a system, and compare that to the core-count/socket-count of the system.
Personally.... I gots better things to do with my time than deep analysis on what should/could be utilized on a server.
I'd much rather [a] evaluate what IS being used, and [b] alert me when that utilization (service/process or system) deviates from the norm.
I think we might not be understanding each other?
I guess what I'm really asking is - what exactly does "Service CPU usage" report in the statistic?
If service X is reported to be using 90% CPU, will the system CPU usage be 90% plus the sum of the usage of all the other processes? Or is the 90% perhaps only the usage on the cores on which the service is running? I don't know the answer to that question, which is what I was trying to get across in the original text, perhaps not very well.
If the Service CPU usage statistic IS the percentage of total system CPU that service X is using, then I'm afraid there is no point in monitoring the service CPU use and system for alerting purposes, although I guess you might want to monitor it for trending, with no thresholds set.
I could, as you say baseline the services to see what the normal usage is, but to be honest the Exchange team only want alerts when there are problems. Temporary deviations from "normal" CPU usage for services would probably not be seen as a problem, and alerts would I'm sure, not be welcomed by the team.
1 of 1 people found this helpful
System CPU/Memory/etc. is polled only once every 10 minutes. Processes & Services in SAM are polled every 5 minutes. Also, if the node is polled via SNMP then what you are seeing is a calculated CPU average between two polls, since it's requires two polls via SNMP to calculate CPU utilization. RPC/WMI provide the exact usage at the time of poll with no calculations or send poll required. Additionally, most customers use SAM's Process and Service Monitors primarily for availability (up/down). Alerting that a process or service is "down" isn't possible unless it's being monitored by SAM. The last important distinction is that SAM's Process & Service Monitors provide detailed historical resource consumption/utilization information, where SAM's Top XX Processes by CPU/Memory/etc alert provides only a snapshot in time.
If all you're concerned about is resource consumption of the system as a whole, have no need or desire to view historical process or service resource consumption for individual processes, and don't need to alert upon process or service availability (up/down) then no you don't need to use SAMs Process and Service Monitors.
Thanks for the response. We use WMI for monitoring most of our Windows servers. We currently use Solarwinds mainly for alerting on issues, but we are moving towards using it for trending & analysis. I think I will probably leave the service monitors in place but remove the thresholds. So we will have statistics for reporting but not multiple alerts for what might be a single problem.
Tanks again both for your replies.