This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

CORE CPU MONITORING

We are having a situation where a server has 4 CPUs and CPU1 has reached 100% and system got hung. Some times CPU2 or CPU3 or CPU4 reached 100%. Sometimes 2 CPUs reached 100%. On all these ocassions, system got hung.

Since Solarwinds monitors the average of 4 CPUs and not at the each CPU level, every time system gets hung.

Is there a solution to the above scenario, where we can monitor and alert the server owner that each CPU is reaching its Warning/Critical thresholds. Please advise.

  • Yes, you need to look at the Orion database to get CPU metric per core:

    SELECT n.Caption ,cml.TimeStampUTC ,cml.CPUIndex ,cml.MaxLoad ,cml.AvgLoad

      ,'/Orion/NetPerfMon/NodeDetails.aspx?NetObject=N:'+CAST(n.NodeID AS varchar(256)) AS [DetailsURL]

      FROM CPUMultiLoad cml

      INNER JOIN NOdes n ON n.NodeID = cml.NodeID

      WHERE cml.NodeID = <NodeID of server in question>

      AND cml.MAxLoad > 99

      AND TimeStampUTC > DATEADD(mi,-10, getdate())   --you may need to play with the DATEADD to correct for your timezone

      ORDER BY TimeStampUTC DESC

    This will show any core that has exceeded 99% utilisation in the past 10 minutes (if you are in UTC).

    Next you need to wrap this into an alert, a little harder, as the node alerts need to based form the node table. Open the Alert Trigger condition, set it to Custom SQL Alert (Advanced) and you'll see what i mean.

    Select Node in the Set up your SQL condition and try this SQL under neath the pre populated grey box:

    INNER JOIN CPUMultiLoad ON Nodes.NodeID = CPUMultiLoad.NodeID

    WHERE CPUMultiLoad.MAxLoad > 99

    AND CPUMultiLoad.TimeStampUTC > DATEADD(mi,-10, getdate())

    it should look like this:

    multicpu_trigger.JPG

    That will trigger when any device, with multiple cpu cores, exceeds the 99% utilisation.

    Obviously, adjust the timezone and threshold to suit your environment and requirements.

    I hope it helps emoticons_wink.png

  • How to get the triggered core or all core utilization value in mail as well

  • Not sure but I think you can get cpu utilization on vital stats page of node for each and every core. I have seen it for network devices.