Report for node down longer than 14 days

I have looked for multiple days...

even watched every youtube video i can find, but i can't find a swql query that will show me all node down for over 14 days...  due to the fact that we have devices that are not servers, it should work for 'IS SERVER=YES' or anyway to make sure I am just looking at down servers...

any help is appreciated...  I have also tried to create a custom report, but can only get the total duration in minutes to be 100805, and i cant change that number...

Parents
  • Here is a SWQL statement I picked up somewhere. I originally had it set for 60 minutes, but I believe it is set to a week now.--


    select n.nodeid, n.caption as [Device]
    -- shows the current status icon
    , '/Orion/images/StatusIcons/Small-' + n.StatusIcon AS [_IconFor_Device]
    -- makes a clickable link to the node details
    , n.DetailsUrl as [_linkfor_Device]
    -- shows the timestamp of the down event, if there is no timestamp then is says the event was greater than the number of days in your event retention settings
    , isnull(tostring(t2.[Down Event]),concat('Greater than ',(SELECT CurrentValue FROM Orion.Settings where settingid='SWNetPerfMon-Settings-Retain Events'),' days ago')) as [Down Event]
    -- shows the timestamp of the up event, unless the object is still down
    , isnull(tostring(t2.[Up Event]),'Still Down') as [Up Event]
    -- figures out the minutes between the down and up events, if the object is still down it counts from the down event to now, displays 99999 if we cannot accurately determine the original downtime, and
    , isnull(MINUTEDIFF(t2.[Down Event], isnull(t2.[Up Event],GETUTCDATE())),99999) as Minutes


    from orion.nodes n
    left join (SELECT
    -- Device nodeid used for our join
    StartTime.Nodes.NodeID

    -- Down Event time stamp in local time zone
    ,ToLocal(StartTime.EventTime) AS [Down Event]

    -- Up Event time stamp in local time zone
    ,(SELECT TOP 1
    ToLocal(EventTime) AS [EventTime]
    FROM Orion.Events AS [EndTime]
    -- picks the first up event that is newer than the down event for this node
    WHERE EndTime.EventTime >= StartTime.EventTime
    -- EventType 5 is a node up
    AND EndTime.EventType = 5
    AND EndTime.NetObjectID = StartTime.NetObjectID
    AND EventTime IS NOT NULL
    ORDER BY EndTime.EventTime
    ) AS [Up Event]

    -- This is the table we are querying
    FROM Orion.Events StartTime

    -- EventType 1 is a node down
    WHERE StartTime.EventType = 1

    ) t2 on n.NodeID = t2.nodeid


    -- this is how I catch nodes that are down but have aged out of the events table, and where I can only get nodes that are still down.
    where (n.status = 2 AND t2.[Up Event] IS NULL)


    -- If you want to filter the results to only show outages of a minimum duration uncomment the below line, this is set to a week.
    and MINUTEDIFF(isnull(t2.[Down Event],(GETUTCDATE()-30)), isnull(t2.[Up Event],GETUTCDATE())) > 10080


    -- if you want to use this query in a search box of the Custom Query resource uncomment the below line
    --and n.Caption like '%${SEARCH_STRING}%'


    order by t2.[down event] desc

  • Oh, but you wanted 2 weeks, just change 10080 to 20160 near the end of the statement. And to get only IsServer then change the last where line and add--  AND n.isserver = TRUE 

    select n.nodeid, n.caption as [Device]
    -- shows the current status icon
    , '/Orion/images/StatusIcons/Small-' + n.StatusIcon AS [_IconFor_Device]
    -- makes a clickable link to the node details
    , n.DetailsUrl as [_linkfor_Device]
    -- shows the timestamp of the down event, if there is no timestamp then is says the event was greater than the number of days in your event retention settings
    , isnull(tostring(t2.[Down Event]),concat('Greater than ',(SELECT CurrentValue FROM Orion.Settings where settingid='SWNetPerfMon-Settings-Retain Events'),' days ago')) as [Down Event]
    -- shows the timestamp of the up event, unless the object is still down
    , isnull(tostring(t2.[Up Event]),'Still Down') as [Up Event]
    -- figures out the minutes between the down and up events, if the object is still down it counts from the down event to now, displays 99999 if we cannot accurately determine the original downtime, and
    , isnull(MINUTEDIFF(t2.[Down Event], isnull(t2.[Up Event],GETUTCDATE())),99999) as Minutes


    from orion.nodes n
    left join (SELECT
    -- Device nodeid used for our join
    StartTime.Nodes.NodeID

    -- Down Event time stamp in local time zone
    ,ToLocal(StartTime.EventTime) AS [Down Event]

    -- Up Event time stamp in local time zone
    ,(SELECT TOP 1
    ToLocal(EventTime) AS [EventTime]
    FROM Orion.Events AS [EndTime]
    -- picks the first up event that is newer than the down event for this node
    WHERE EndTime.EventTime >= StartTime.EventTime
    -- EventType 5 is a node up
    AND EndTime.EventType = 5
    AND EndTime.NetObjectID = StartTime.NetObjectID
    AND EventTime IS NOT NULL
    ORDER BY EndTime.EventTime
    ) AS [Up Event]

    -- This is the table we are querying
    FROM Orion.Events StartTime

    -- EventType 1 is a node down
    WHERE StartTime.EventType = 1

    ) t2 on n.NodeID = t2.nodeid


    -- this is how I catch nodes that are down but have aged out of the events table, and where I can only get nodes that are still down.
    where (n.status = 2 AND t2.[Up Event] IS NULL AND n.IsServer=TRUE)


    -- If you want to filter the results to only show outages of a minimum duration uncomment the below line, this is set to a week.
    and MINUTEDIFF(isnull(t2.[Down Event],(GETUTCDATE()-30)), isnull(t2.[Up Event],GETUTCDATE())) > 20160


    -- if you want to use this query in a search box of the Custom Query resource uncomment the below line
    --and n.Caption like '%${SEARCH_STRING}%'


    order by t2.[down event] desc

  • Good work on adjusting your query to fit your needs. Referencing your code block above, I had the AND n.IsServer = True on line 45 (Or to put it outside of the parentheses then it would go on line 46,) instead of on line 42, but if it works for you I won't look to see what the difference might be. 

    group by 
    is used if you do a count or something , Did you want to use order by to sort the results by owner?
    If you wanted to include the Owner and sort them by the owner then on the first line, add n.CustomProperties.owner, before the n.nodeid. Then on the last line add n.CustomeProperties.owner, right after order by.

    Is Model a custom property for you? If so then this might work 
    Right after where you put the AND isServer=True, add AND n.CustomProperties.model <> 'Biomed' If it doesn't work there then add it to line 46. 

    Vendor might be what you are looking for(?) and if so it would just be n.Vendor <> 'Biomed' 

  • Awesome... I have added the 'biomed' to line 46 and it is working...  but still showing pacs devices...

     how to not show the pacs devices?

  • I'm assuming the pacs devices are the Biomed devices? The vendor name must not be Biomed. Not knowing the details of those devices you can add this to your query, after around line 50 depending on what adjustments you have made...

    and n.Caption NOT Like ('pacs%')

    As a side note. If they have been down for more than 90 days should those devices be deleted from your system?

  • Ah....  I was able to get it like this...

    AND n.CustomProperties.INFRASTRUCTURE_TEAM <> 'PACS'

  • Also, giving false positive in this query   showing CCEprd1 as 'Still Down' above, but this node is actually up... IDEAS?

  • In the Swql query it has n.status = 2, which should only include nodes that are down. What is the status in the database for the node? 

    SELECT NodeID, Caption, Status
    FROM Orion.Nodes
    Where Caption='CCEprd1'


    I'm actually having problems recently where the agent stops connecting on multiple servers that I am troubleshooting, where we don't get recent metrics but it shows the node as up. Maybe ping is working, but the agent can't connect?

  • It is also showing nodes up since jul 6...

  • try moving the t2.nodeid is not null statement to outside the parenthesis you currently have it in. Because you are grouping status =2 OR nodeid is not null its picking picking up all of the nodes that have a nodeid that also have an event in the events table. You'll need to ungroup that part of the where statement to get it to work the way you are intending.

  • I changed my script a while ago to use AND instead of OR there. But I bet I messed up the intent of the script when I did that. I'm wondering if I need to start from scratch on my own, or go through line by line again. 

Reply Children