This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

How can I create a weekly report for every time a node crosses 100% CPU utilization and when it resets?

I would like to create a report that will run weekly, that will show how many times any of my nodes will cross 100% CPU utilization threshold, and how long it takes before they reset. I'm trying to figure out patterns of CPU utilization, and minimize alerts that quickly reset regularly. I know how to read (but not write) SQL/SWQL, so if someone can give me some pointers, I would really appreciate that.

  • So there's a couple ways to approach this.  You could write the report to search through every cpu poll looking for the times when the system goes above/below the thresholds but it will be a lot more computationally efficient to set up an alert at that threshold and then just diff the timestamps between those alert triggers and resets.  I've posted this custom SWQL report a couple times already on Thwack but if you just take it and filter it to your CPU load alerts it should do what you are asking.

    --report on alerts triggered

    select ac.Name

    ,ah.Message

    ,'/Orion/NetPerfMon/ActiveAlertDetails.aspx?NetObject=AAT:'+ToString(AlertObjectID) as [_linkfor_Name]

    ,EntityCaption as [Trigger Object]

    ,EntityDetailsUrl as [_linkfor_Trigger Object]

    ,case

    WHEN RelatedNodeCaption=EntityCaption THEN 'Self'

    When RelatedNodeCaption!=EntityCaption THEN RelatedNodeCaption

    End as [Parent Node]

    ,RelatedNodeDetailsUrl as [_linkfor_Parent Node]

    ,'/Orion/images/StatusIcons/Small-' + p.StatusIcon AS [_IconFor_Parent Node]

    ,tostring(tolocal(ah.TimeStamp)) as [Trigger Time]

    ,case when ack.timestamp is null then 'N/A'

    else tostring(minutediff(ah.TimeStamp,ack.timestamp))

    end as [Minutes Until Acknowledged]

    ,ack.Message as [Note]

    ,case when reset.timestamp is null then 'N/A'

    else tostring(minutediff(ah.TimeStamp,reset.timestamp))

    end as [Minutes Until Reset]

    FROM Orion.AlertHistory ah

    left join Orion.AlertObjects ao on ao.alertobjectid=ah.alertobjectid

    left join Orion.AlertConfigurations ac on ac.alertid=ao.alertid

    left join Orion.Actions a on a.actionid=ah.actionid

    left join Orion.Nodes p on p.nodeid=RelatedNodeID

    left join (select timestamp, AlertActiveID, AlertObjectID,message from orion.alerthistory ah where eventtype=2) ack on ack.alertactiveid=ah.AlertActiveID and ack.alertobjectid=ah.AlertObjectID

    left join (select timestamp, AlertActiveID, AlertObjectID from orion.alerthistory ah where eventtype=1) reset on reset.alertactiveid=ah.AlertActiveID and reset.alertobjectid=ah.AlertObjectID

    WHERE

    daydiff(ah.timestamp,GETUTCDATE())<30

    and ah.eventtype=0

    --and (ac.Name like '%YourCPUAlert%')

    order by ah.timestamp desc

  • The great thing about Solarwinds is its visual approach to data.

    I would simply create a chart based report for CPU utilization across a group of nodes that has thresholds marked on it. You will be able to clearly see when nodes breach the threshold and when the cpu utilization returns down below the threshold. Doing it this way saves you having to think about coding when most of the work is already in the standard reports in Solarwinds. You can run this report per node or based upon a group of nodes.

    If it isn't easy to see how many threshold breaches occur then its time to reduce the time period displayed on the graph until the information you want is displayed clearly. You could design some fancy script to do this for you but I wouldn't bother, i'd just send this to myself once a week and look at the peaks to determine where high CPU utilization occurs.

    Adam @Acmtix

  • I am learning more every day .... thanks for taking the time to share this information - when I take the time, every little piece of code that I collect will help me achieve something bigger and better down the road.  I am really digging this community!  I have had more time to take advantage of THWACK, and it is inevitably making me a better employee!