I would like to shed some light on the recent change in NPM whereby the top10 page is now displaying the average CPU utilization out of a multi-CPU chassis rather than reading the individual CPUs like it used to.
We are monitoring around 60 Cisco 7609 boxes which constitute part of our IP/MPLS core. We also monitor Cisco CRS1s, Cisco 6500, 3750, 3560, 2960 switches and more.
As many of you know, that chassis is modular and each module has its own processor, apart from the main supervisor engine CPU.
The most common reasons for high CPU in such chassis is when traffic starts getting process switched instead of hardware switched, or when there is excessive broadcasts of other types of traffic directly hitting the control plane.
When this happens, it is only the supervisor engine CPU that gets the hit, which could go up to 99%, while the other modules' processors stay at maybe 5% or so. And in such cases, the box's performance is severely degraded, and this is when our NOC needs to see this and take action.
We actually did have cases where some boxes were hit, and our NOC was not aware of it due to the fact that NPM is only showing the average of CPU utilization in that chassis.
I know i can have a custom poller that would read the main CPU utilization, but that defeats the purpose. NPM was, and should still be able to provide this information.
The administrator should have the options to select whether he wants to see individual CPU related information or average.
Now we have to configure a new custom poller with the CPU utilization OID.....shouldn't have to do this when we have NPM!
Would the development team please look into this?