CPU monitoring

I have been tasked with trying to find a way to report/alert on a specific CPU situation. We are looking to report on devices that have had the same % CPU usage for a specific period of time. For example a server that has had a consistent CPU usage of 28% for the last 48 hours. Is there any way to do this in Orion?

  • if you know the specific time, you should be able to achieve something like this with a count of the records meeting your expectations.

    assuming a default 10min statistic polling interval; you'll have 288 records (+/- 1 record) in a 48 hour window.

    SELECT 
        NodeID
        ,COUNT(1)
    FROM Orion.CPULoad
    WHERE AvgLoad = 28
    AND DateTime > GETUTCDATE()-2
    GROUP BY NodeID
    HAVING COUNT(1) >= 287

    that query will return how many times the avgLoad result of a CPU Poll for a node was = 28 over the last 2 days, and then filter out only those nodes with >= 287 records that match the 28% load.

    if you use the alert engine; it will look a little different because there's a default top of query that you have to account for.

    Start with a "Node" type alert and you'll end up with something like this:

    -- This is essentially what's at the top (I can't honestly recall the exact syntax atm)
    SELECT E0.NodeID, E0.Uri FROM Orion.Nodes

    -- Then you will add this to the text box below
    JOIN (
        SELECT
            NodeID
            ,COUNT(1)
        FROM Orion.CPULoad
        WHERE AvgLoad = 28
        AND DateTime > GETUTCDATE()-2
        GROUP BY NodeID
        HAVING COUNT(1) >= 287
    ) AS [cpu] ON cpu.NodeID = E0.NodeID

    hope this helps; if that's not the exact syntax and you need help, let me know here and I will grab the precise syntax for you later

  • Zack, thank you for your detailed answer. My phrasing of the question may not have been right. I used 28% as an example, however we do not want to specify any percentage.The goal of this is to attempt to find "frozen" devices. The CPU can be of any percentage including 0. We would want to find any device who's CPU percentage hasn't changed for x amount of hours, and in my example I used 48. Would there be a way to specify that?

  • Here is a possible starting point:

    SELECT
        nc.NodeID,
        nc.AvgLoad,
        nc.ObservationCount
    FROM (
        -- Get counts grouped just by node.
        SELECT
            NodeID,
            COUNT(*) AS ObservationCount
        FROM Orion.CPULoad
        WHERE ObservationTimestamp > ADDDAY(-2, GETUTCDATE())
        GROUP BY NodeID
    ) AS n
    INNER JOIN (
        -- Get counts grouped by the combination of node and average load.
        SELECT
            NodeID,
            AvgLoad,
            COUNT(*) AS ObservationCount
        FROM Orion.CPULoad
        WHERE ObservationTimestamp > ADDDAY(-2, GETUTCDATE())
        GROUP BY NodeID, AvgLoad
    ) AS nc
        -- Match on NodeID and count.  When the counts match, there aren't any other
        -- average load measurements for this node in the last two days.
        ON nc.NodeID = n.NodeID
        AND nc.ObservationCount = n.ObservationCount
    -- To avoid noise, you might want a minimum number of observations to avoid reporting new nodes, etc.
    WHERE nc.ObservationCount > 5
  • Thank you for this input. I am going to give this a shot when I have a moment.