cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 7

CPU monitoring

Jump to solution

I have been tasked with trying to find a way to report/alert on a specific CPU situation. We are looking to report on devices that have had the same % CPU usage for a specific period of time. For example a server that has had a consistent CPU usage of 28% for the last 48 hours. Is there any way to do this in Orion?

0 Kudos
1 Solution
Level 13

Here is a possible starting point:

SELECT
    nc.NodeID,
    nc.AvgLoad,
    nc.ObservationCount
FROM (
    -- Get counts grouped just by node.
    SELECT
        NodeID,
        COUNT(*) AS ObservationCount
    FROM Orion.CPULoad
    WHERE ObservationTimestamp > ADDDAY(-2, GETUTCDATE())
    GROUP BY NodeID
) AS n
INNER JOIN (
    -- Get counts grouped by the combination of node and average load.
    SELECT
        NodeID,
        AvgLoad,
        COUNT(*) AS ObservationCount
    FROM Orion.CPULoad
    WHERE ObservationTimestamp > ADDDAY(-2, GETUTCDATE())
    GROUP BY NodeID, AvgLoad
) AS nc
    -- Match on NodeID and count.  When the counts match, there aren't any other
    -- average load measurements for this node in the last two days.
    ON nc.NodeID = n.NodeID
    AND nc.ObservationCount = n.ObservationCount
-- To avoid noise, you might want a minimum number of observations to avoid reporting new nodes, etc.
WHERE nc.ObservationCount > 5

View solution in original post

4 Replies
Level 13

Here is a possible starting point:

SELECT
    nc.NodeID,
    nc.AvgLoad,
    nc.ObservationCount
FROM (
    -- Get counts grouped just by node.
    SELECT
        NodeID,
        COUNT(*) AS ObservationCount
    FROM Orion.CPULoad
    WHERE ObservationTimestamp > ADDDAY(-2, GETUTCDATE())
    GROUP BY NodeID
) AS n
INNER JOIN (
    -- Get counts grouped by the combination of node and average load.
    SELECT
        NodeID,
        AvgLoad,
        COUNT(*) AS ObservationCount
    FROM Orion.CPULoad
    WHERE ObservationTimestamp > ADDDAY(-2, GETUTCDATE())
    GROUP BY NodeID, AvgLoad
) AS nc
    -- Match on NodeID and count.  When the counts match, there aren't any other
    -- average load measurements for this node in the last two days.
    ON nc.NodeID = n.NodeID
    AND nc.ObservationCount = n.ObservationCount
-- To avoid noise, you might want a minimum number of observations to avoid reporting new nodes, etc.
WHERE nc.ObservationCount > 5

View solution in original post

Level 7

Thank you for this input. I am going to give this a shot when I have a moment.

0 Kudos
Level 15

if you know the specific time, you should be able to achieve something like this with a count of the records meeting your expectations.

assuming a default 10min statistic polling interval; you'll have 288 records (+/- 1 record) in a 48 hour window.

SELECT 
    NodeID
    ,COUNT(1)
FROM Orion.CPULoad
WHERE AvgLoad = 28
AND DateTime > GETUTCDATE()-2
GROUP BY NodeID
HAVING COUNT(1) >= 287

that query will return how many times the avgLoad result of a CPU Poll for a node was = 28 over the last 2 days, and then filter out only those nodes with >= 287 records that match the 28% load.

if you use the alert engine; it will look a little different because there's a default top of query that you have to account for.

Start with a "Node" type alert and you'll end up with something like this:

-- This is essentially what's at the top (I can't honestly recall the exact syntax atm)
SELECT E0.NodeID, E0.Uri FROM Orion.Nodes

-- Then you will add this to the text box below
JOIN (
    SELECT
        NodeID
        ,COUNT(1)
    FROM Orion.CPULoad
    WHERE AvgLoad = 28
    AND DateTime > GETUTCDATE()-2
    GROUP BY NodeID
    HAVING COUNT(1) >= 287
) AS [cpu] ON cpu.NodeID = E0.NodeID

hope this helps; if that's not the exact syntax and you need help, let me know here and I will grab the precise syntax for you later

Level 7

Zack, thank you for your detailed answer. My phrasing of the question may not have been right. I used 28% as an example, however we do not want to specify any percentage.The goal of this is to attempt to find "frozen" devices. The CPU can be of any percentage including 0. We would want to find any device who's CPU percentage hasn't changed for x amount of hours, and in my example I used 48. Would there be a way to specify that?

0 Kudos