Reboots and Downtime Report (Last 7 Days)

Updated to include Availability from Orion.ResponseTime, calculated Availability (based on minutes down from Orion.NetObjectDowntime), and the number of 'Node rebooted' events from Orion.Events.

It was inspired by a question posted on the Network Performance Monitor (NPM) Forum.

There is plenty of details in the query about how the information was calculated.  There's also a mention (commented out) on how to add filters for Node-based custom properties.

Important Note:

There will be a discrepancy between Availability and the Calculated Availability.  This is based on the polling cycles, the pollers being used, and the way data is stored in the database.  I (personally) prefer the Calculated Availability because it isn't limited to ICMP tests only.

Anonymous
  • BTW - if they are scheduled Unmanaged then polling stops 100% and the platform has no idea if they are or are not "up."

    However, if they are "muted" then the polling still happens and should get flagged as being offline.  The way I understand it when an element is "Muted," the polling still occurs and the alerts still trigger, but the "trigger actions" in the alerts (like sending an email or writing a log) do not take place.

  • These servers have been offline a few times.  I reboot them to patch, they have unscheduled main happen. They are agent based devices. Stumped.  but thanks for getting back to me.

  • Point of clarity:

    "It's not pulling data for any servers that are online" - You mean ones that are online but have never been offline - because it should.  The Orion.NetObjectDowntime entity only records downtime - so anything that's always been up won't have data in that entity - hence no records in the final report.

    If you instead want an "uptime" report - or a 5-9's or something like that, then that'll rely on a slightly different query.  I build this one specifically for Nodes that have been down at some point.

  • Yeah,  If i remove the line below it pulls back my servers that are currently deallocated, but is not pulling data for any servers that are online.

    AND [SI].StatusName = 'Down' -- Where the status is 'Down' (not 'Critical', 'Warning', etc.)

  • The query behind the scenes is pretty good, but by no means fool-proof for every installation.

    SELECT [Nodes].Caption
         , [Nodes].DetailsUrl
         , [Nodes].VendorInfo.Icon
         , [ResponseTime].Day
         , [ResponseTime].Availability
         , [Downtime].MinutesDown
         , [Downtime].CalcAvailability
         , [Events].RebootCount
    -- To add Node Custom Properties use:
    -- , [Node].CustomProperties.<Custom Property Name>
    FROM Orion.Nodes AS [Nodes]
    INNER JOIN (
         SELECT [RT].NodeID
              , DateTrunc('day', [RT].DateTime) AS [Day]
              , AVG([RT].Availability * 1.0) AS Availability
         FROM Orion.ResponseTime AS [RT]
         GROUP BY DateTrunc('day', [RT].DateTime)
              , [RT].NodeID
         ) AS [ResponseTime]
         ON [Nodes].NodeID = [ResponseTime].NodeID
    INNER JOIN (
         SELECT [DT].NodeId
              , DateTrunc('day', [DT].DateTimeUntil) AS [Day]
              , SUM([DT].TotalDurationMin) AS [MinutesDown]
              , ( ( 1.0 - ( SUM([DT].TotalDurationMin) / 1440.0 ) ) * 100.0 ) AS [CalcAvailability]
         FROM Orion.NetObjectDowntime AS [DT]
         INNER JOIN Orion.StatusInfo AS [SI]
              ON [DT].State = [SI].StatusId
                   AND [SI].StatusName = 'Down' -- Where the status is 'Down' (not 'Critical', 'Warning', etc.)
         WHERE [DT].EntityType = 'Orion.Nodes' -- Only show me nodes
              AND [DT].DateTimeUntil IS NOT NULL --Indicates that it's no longer down
         GROUP BY DateTrunc('day', [DT].DateTimeUntil)
              , [DT].NodeId
         ) AS [Downtime]
         ON [ResponseTime].NodeId = [Downtime].NodeID
              AND [ResponseTime].Day = [Downtime].Day
    RIGHT JOIN (
         SELECT [E].Nodes.NodeID
              , COUNT([E].EventID) AS [RebootCount]
              , DATETRUNC('day', [E].EventTime) AS [Day]
         FROM Orion.Events AS [E]
         WHERE [E].EventTypeProperties.Name = 'Node rebooted'
         --   AND [E].Nodes.CustomProperties.>Custom Property Name< = >Custom Property Value<
         GROUP BY DATETRUNC('day', [E].EventTime)
              , [E].Nodes.NodeID
         ) AS [Events]
         ON [ResponseTime].NodeID = [Events].NodeID
              AND [ResponseTime].Day = [Events].Day
    WHERE [Nodes].ObjectSubType <> 'ICMP'
    -- Required for Orion Reports
       AND [ResponseTime].Day BETWEEN ${FromTime} AND ${ToTime}
    -- To filter on Node Custom Properties add
    --  AND [Node].CustomProperties.<Custom Property Name> = <Custom Property Value>

    You can probably read this for yourself to see how it's filtered.  It ignores "ICMP" (Ping only) nodes and ignores devices that are currently still down (made the queries easier). Other than that, it's pretty simple in it's entity queries.  It pulls data from Orion.Nodes, Orion.Events, and Orion.NetObjectDowntime.