6 Replies Latest reply on Sep 1, 2016 2:14 PM by jdumke

    Polling and reporting real uptime

    Atamido

      Edit: Skip to my reply below to see the fixed SQL report that will show real uptime.  Keep reading for an explanation of why the uptime reported by SolarWinds can be horribly wrong.

      The issue is that the "up time" queried and presented by SolarWinds appears to be from OID 1.3.6.1.2.1.1.3 (sysUpTime), which is a 32-bit INTEGER for the amount of time the SNMP agent has been running on the device, presented in centiseconds (1/100th of a second), and stored in the database under Nodes.SystemUpTime as seconds.  On Windows and Unix servers this is problematic because the SNMP service can restart for a number of reasons without the OS rebooting, which results in SolarWinds showing the wrong up time.  It is also problematic for devices that stay up for a long time as the counter will flip every ~497 days.  We have Cisco devices with uptimes of ~1900 days that are reporting <400 days.

      For our Unix and Windows servers, we added a custom poller for OID 1.3.6.1.2.1.25.1.1 (hrSystemUpTime) to get up time.  It doesn't appear to work on any of our other equipment, but works servers.  Unfortunately Windows 2003-2008 reports through it in milliseconds while Unix/Linux/Windows 2008 R2 all report in centiseconds (1/100th of a second.)

      For network devices, we added a custom poller for OID 1.3.6.1.6.3.10.2.1.3 (snmpEngineTime).  It is a 64-bit INTEGER for the SNMP agent up time in seconds. Unfortunately not all network devices support this OID.

      I've managed to put together a query that will compare SystemUpTime to hrSystemUpTime (if it exists) and use the larger value.  I use variations of this query in Advanced Sql Reports which I then display in parts of the the Orion web interface.  I feel like there is a better way to do this, possibly using variables, but I lack the skill to refine it further and include the snmpEngineTime OID. 

      I cast the numbers as BIGINT everywhere because they kept exceeding the default 32-bit size.  The names of the custom pollers are what Solarwinds says is the name of the OID.  Most of my queries include the full days/hours/minutes/seconds of uptime.  I'd like to even factor in the Nodes.LastSystemUpTimePollUtc column to give potentially even more accurate data.  I exclude "down" nodes as their uptime can't be considered accurate.

       

      SELECT Top 10
      Nodes.NodeID,
      Nodes.Caption,

      CASE
      WHEN CustomNodePollers_CustomPollers.UniqueName = 'hrSystemUpTime'
      THEN CASE
      WHEN (Nodes.MachineType LIKE 'Windows 2003%' OR Nodes.MachineType = 'Windows 2008 Server')
      AND   CAST(CustomNodePollerStatus_CustomPollerStatus.Status AS BIGINT) > (CAST(Nodes.SystemUpTime AS BIGINT) * 100)
      THEN (CAST(CustomNodePollerStatus_CustomPollerStatus.Status AS BIGINT)/86400000)
      WHEN  CAST(CustomNodePollerStatus_CustomPollerStatus.Status AS BIGINT) > (CAST(Nodes.SystemUpTime AS BIGINT) * 100)
      THEN (CAST(CustomNodePollerStatus_CustomPollerStatus.Status AS BIGINT)/8640000)
      ELSE (CAST(Nodes.SystemUpTime AS BIGINT)/86400)
      END
      ELSE (CAST(Nodes.SystemUpTime AS BIGINT)/86400)
      END AS Days,

      CASE
      WHEN CustomNodePollers_CustomPollers.UniqueName = 'hrSystemUpTime'
      THEN CASE
      WHEN (Nodes.MachineType LIKE 'Windows 2003%' OR Nodes.MachineType = 'Windows 2008 Server')
      AND   CAST(CustomNodePollerStatus_CustomPollerStatus.Status AS BIGINT) > (CAST(Nodes.SystemUpTime AS BIGINT) * 100)
      THEN (CAST(CustomNodePollerStatus_CustomPollerStatus.Status AS BIGINT)/3600000 % 24)
      WHEN  CAST(CustomNodePollerStatus_CustomPollerStatus.Status AS BIGINT) > (CAST(Nodes.SystemUpTime AS BIGINT) * 100)
      THEN (CAST(CustomNodePollerStatus_CustomPollerStatus.Status AS BIGINT)/360000 % 24)
      ELSE (CAST(Nodes.SystemUpTime AS BIGINT)/3600 % 24)
      END
      ELSE (CAST(Nodes.SystemUpTime AS BIGINT)/3600 % 24)
      END AS Hours,

      CASE
      WHEN CustomNodePollers_CustomPollers.UniqueName = 'hrSystemUpTime'
      THEN CASE
      WHEN (Nodes.MachineType LIKE 'Windows 2003%' OR Nodes.MachineType = 'Windows 2008 Server')
      AND   CAST(CustomNodePollerStatus_CustomPollerStatus.Status AS BIGINT) > (CAST(Nodes.SystemUpTime AS BIGINT) * 100)
      THEN (CAST(CustomNodePollerStatus_CustomPollerStatus.Status AS BIGINT)/1000)
      WHEN  CAST(CustomNodePollerStatus_CustomPollerStatus.Status AS BIGINT) > (CAST(Nodes.SystemUpTime AS BIGINT) * 100)
      THEN (CAST(CustomNodePollerStatus_CustomPollerStatus.Status AS BIGINT)/100)
      ELSE (CAST(Nodes.SystemUpTime AS BIGINT))
      END
      ELSE (CAST(Nodes.SystemUpTime AS BIGINT))
      END AS Seconds


      FROM
      ((Nodes
      LEFT JOIN CustomPollerAssignment CustomNodePollerAssignment_CustomPollerAssignment
      ON (Nodes.NodeID = CustomNodePollerAssignment_CustomPollerAssignment.NodeID
      AND CustomNodePollerAssignment_CustomPollerAssignment.AssignmentName LIKE 'hrSystemUptime%')) 
      LEFT JOIN CustomPollers CustomNodePollers_CustomPollers
      ON (CustomNodePollerAssignment_CustomPollerAssignment.CustomPollerID = CustomNodePollers_CustomPollers.CustomPollerID)) 
      LEFT JOIN CustomPollerStatus CustomNodePollerStatus_CustomPollerStatus
      ON (CustomNodePollerAssignment_CustomPollerAssignment.CustomPollerAssignmentID = CustomNodePollerStatus_CustomPollerStatus.CustomPollerAssignmentID)


      WHERE ((CustomNodePollers_CustomPollers.UniqueName = 'hrSystemUpTime')
      OR CAST(Nodes.SystemUpTime AS BIGINT) >0)
      AND (Nodes.Status <> 2)


      ORDER BY Seconds DESC

        • Re: Polling and reporting real uptime
          Atamido

          I reworked the SQL statement to consolidate all of the logic to one area, and fixed some bugs as well as increasing the accuracy.  The reported uptime now takes into account the time that the node was polled for uptime, so you will see the numbers increment each time you refresh the page.

          It will work just fine without any additional pollers, but adding in the hrSystemUpTime and/or snmpEngineTime custom pollers mentioned above will make them be automatically added to the list, and the largerst value of the available pollers will be used.

          Any system that hasn't responded to SNMP in the past hour will be filtered from the list.

          To see the most recently rebooted systems, change:
          ORDER BY Seconds DESC
          to
          ORDER BY Seconds ASC

          To use this on the Node Details page to show an individual node's uptime, change:
          SELECT TOP 10
          to
          SELECT TOP 1
          and add:
          AND Nodes.NodeID = ${NodeID}
          to the bottom of the WHERE statement

           

          WITH SysUptime (NodeId, Caption, Seconds)
          AS (
          
          SELECT TOP 10
          Nodes.NodeID,
          Nodes.Caption,
          
          CASE
               WHEN (CTP.UniqueName = 'hrSystemUptime')
                    THEN CASE
                         WHEN (Nodes.MachineType LIKE 'Windows 2003%' OR Nodes.MachineType = 'Windows 2008 Server')
                              THEN CASE
                                   WHEN (CAST(CTPS.Status AS BIGINT)/1000 + DATEDIFF(SECOND,CTPS.DateTime,GETUTCDATE()) > CAST(Nodes.SystemUpTime AS BIGINT) + DATEDIFF(SECOND,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()))
                                        THEN (CAST(CTPS.Status AS BIGINT)/1000 + DATEDIFF(SECOND,CTPS.DateTime,GETUTCDATE()))
                                   ELSE (CAST(Nodes.SystemUpTime AS BIGINT) + DATEDIFF(SECOND,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()))
                              END
                         WHEN CAST(CTPS.Status AS BIGINT)/100 + DATEDIFF(SECOND,CTPS.DateTime,GETUTCDATE()) > (CAST(Nodes.SystemUpTime AS BIGINT) + DATEDIFF(SECOND,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()))
                              THEN (CAST(CTPS.Status AS BIGINT)/100 + DATEDIFF(SECOND,CTPS.DateTime,GETUTCDATE()))
                         ELSE (CAST(Nodes.SystemUpTime AS BIGINT) + DATEDIFF(SECOND,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()))
                    END
               WHEN (CTP.UniqueName = 'snmpEngineTime')
                    THEN CASE
                         WHEN (CAST(CTPS.Status AS BIGINT) + DATEDIFF(SECOND,CTPS.DateTime,GETUTCDATE()) > CAST(Nodes.SystemUpTime AS BIGINT) + DATEDIFF(SECOND,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()))
                              THEN (CAST(CTPS.Status AS BIGINT) + DATEDIFF(SECOND,CTPS.DateTime,GETUTCDATE()))
                         ELSE (CAST(Nodes.SystemUpTime AS BIGINT) + DATEDIFF(SECOND,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()))
                    END
               ELSE (CAST(Nodes.SystemUpTime AS BIGINT) + DATEDIFF(SECOND,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()))
          END AS Seconds
          
          FROM Nodes
          LEFT JOIN CustomPollerAssignment CTPA
          ON (Nodes.NodeID = CTPA.NodeID AND
                    (CTPA.AssignmentName LIKE 'hrSystemUptime%' OR CTPA.AssignmentName LIKE 'snmpEngineTime%'))
          LEFT JOIN CustomPollers CTP
          ON CTPA.CustomPollerID = CTP.CustomPollerID
          LEFT JOIN CustomPollerStatus CTPS
          ON CTPA.CustomPollerAssignmentID = CTPS.CustomPollerAssignmentID
          
          
          WHERE ((CTP.UniqueName IS NOT NULL)
          OR CAST(Nodes.SystemUpTime AS BIGINT) >0)
          AND (Nodes.Status <> 2)
          AND (DATEDIFF(HOUR,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()) < 1)
          
          ORDER BY Seconds DESC
          )
          
          SELECT
          SysUptime.NodeID,
          SysUptime.caption,
          SysUptime.Seconds/86400 AS Days,
          SysUptime.Seconds/3600%24 AS Hours,
          SysUptime.Seconds/60%60 AS Mins,
          SysUptime.Seconds%60 AS Secs
          
          FROM SysUptime
            • Re: Polling and reporting real uptime
              ctopaloglu

              Excellent work!

              • Re: Polling and reporting real uptime
                jdumke

                This is exactly what we've been looking for.  I have the query working inside of the Database Manager making a few modifications for our environment in the AND and adding a final WHERE clause.  But when I attempt to put this into a Resource as a Custom Query I get the error: Error: A query to the SolarWinds Information Service failed.

                 

                WITH SysUptime (NodeId, Caption, Seconds)
                AS (

                SELECT TOP 1000
                Nodes.NodeID,
                Nodes.Caption,

                CASE
                     WHEN (CTP.UniqueName = 'hrSystemUptime')
                          THEN CASE
                               WHEN (Nodes.MachineType LIKE 'Windows 2003%' OR Nodes.MachineType = 'Windows 2008 Server')
                                    THEN CASE
                                         WHEN (CAST(CTPS.RawStatus AS BIGINT)/1000 + DATEDIFF(SECOND,CTPS.DateTime,GETUTCDATE()) > CAST(Nodes.SystemUpTime AS BIGINT) + DATEDIFF(SECOND,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()))
                                              THEN (CAST(CTPS.RawStatus AS BIGINT)/1000 + DATEDIFF(SECOND,CTPS.DateTime,GETUTCDATE()))
                                         ELSE (CAST(Nodes.SystemUpTime AS BIGINT) + DATEDIFF(SECOND,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()))
                                    END
                               WHEN CAST(CTPS.RawStatus AS BIGINT)/100 + DATEDIFF(SECOND,CTPS.DateTime,GETUTCDATE()) > (CAST(Nodes.SystemUpTime AS BIGINT) + DATEDIFF(SECOND,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()))
                                    THEN (CAST(CTPS.RawStatus AS BIGINT)/100 + DATEDIFF(SECOND,CTPS.DateTime,GETUTCDATE()))
                               ELSE (CAST(Nodes.SystemUpTime AS BIGINT) + DATEDIFF(SECOND,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()))
                          END
                     WHEN (CTP.UniqueName = 'snmpEngineTime')
                          THEN CASE
                               WHEN (CAST(CTPS.RawStatus AS BIGINT) + DATEDIFF(SECOND,CTPS.DateTime,GETUTCDATE()) > CAST(Nodes.SystemUpTime AS BIGINT) + DATEDIFF(SECOND,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()))
                                    THEN (CAST(CTPS.RawStatus AS BIGINT) + DATEDIFF(SECOND,CTPS.DateTime,GETUTCDATE()))
                               ELSE (CAST(Nodes.SystemUpTime AS BIGINT) + DATEDIFF(SECOND,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()))
                          END
                     ELSE (CAST(Nodes.SystemUpTime AS BIGINT) + DATEDIFF(SECOND,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()))
                END AS Seconds

                FROM Nodes
                LEFT JOIN CustomPollerAssignment CTPA
                ON (Nodes.NodeID = CTPA.NodeID AND
                          (CTPA.AssignmentName LIKE 'hrSystemUptime%' OR CTPA.AssignmentName LIKE 'snmpEngineTime%'))
                LEFT JOIN CustomPollers CTP
                ON CTPA.CustomPollerID = CTP.CustomPollerID
                LEFT JOIN CustomPollerStatus CTPS
                ON CTPA.CustomPollerAssignmentID = CTPS.CustomPollerAssignmentID

                WHERE ((CTP.UniqueName IS NOT NULL)
                OR CAST(Nodes.SystemUpTime AS BIGINT) >0)
                AND (Nodes.Status <> 2)
                AND (DATEDIFF(HOUR,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()) < 1)
                AND (Closet <> 'Printer')
                AND (Closet <> 'Cameras')
                And (NodesData_Category = 2)

                ORDER BY Seconds DESC
                )

                SELECT
                SysUptime.NodeID,
                SysUptime.caption,
                SysUptime.Seconds/86400 AS Days,
                SysUptime.Seconds/3600%24 AS Hours,
                SysUptime.Seconds/60%60 AS Mins,
                SysUptime.Seconds%60 AS Secs

                FROM SysUptime
                WITH SysUptime (NodeId, Caption, Seconds)
                AS (

                SELECT TOP 1000
                Nodes.NodeID,
                Nodes.Caption,

                CASE
                     WHEN (CTP.UniqueName = 'hrSystemUptime')
                          THEN CASE
                               WHEN (Nodes.MachineType LIKE 'Windows 2003%' OR Nodes.MachineType = 'Windows 2008 Server')
                                    THEN CASE
                                         WHEN (CAST(CTPS.RawStatus AS BIGINT)/1000 + DATEDIFF(SECOND,CTPS.DateTime,GETUTCDATE()) > CAST(Nodes.SystemUpTime AS BIGINT) + DATEDIFF(SECOND,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()))
                                              THEN (CAST(CTPS.RawStatus AS BIGINT)/1000 + DATEDIFF(SECOND,CTPS.DateTime,GETUTCDATE()))
                                         ELSE (CAST(Nodes.SystemUpTime AS BIGINT) + DATEDIFF(SECOND,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()))
                                    END
                               WHEN CAST(CTPS.RawStatus AS BIGINT)/100 + DATEDIFF(SECOND,CTPS.DateTime,GETUTCDATE()) > (CAST(Nodes.SystemUpTime AS BIGINT) + DATEDIFF(SECOND,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()))
                                    THEN (CAST(CTPS.RawStatus AS BIGINT)/100 + DATEDIFF(SECOND,CTPS.DateTime,GETUTCDATE()))
                               ELSE (CAST(Nodes.SystemUpTime AS BIGINT) + DATEDIFF(SECOND,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()))
                          END
                     WHEN (CTP.UniqueName = 'snmpEngineTime')
                          THEN CASE
                               WHEN (CAST(CTPS.RawStatus AS BIGINT) + DATEDIFF(SECOND,CTPS.DateTime,GETUTCDATE()) > CAST(Nodes.SystemUpTime AS BIGINT) + DATEDIFF(SECOND,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()))
                                    THEN (CAST(CTPS.RawStatus AS BIGINT) + DATEDIFF(SECOND,CTPS.DateTime,GETUTCDATE()))
                               ELSE (CAST(Nodes.SystemUpTime AS BIGINT) + DATEDIFF(SECOND,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()))
                          END
                     ELSE (CAST(Nodes.SystemUpTime AS BIGINT) + DATEDIFF(SECOND,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()))
                END AS Seconds

                FROM Nodes
                LEFT JOIN CustomPollerAssignment CTPA
                ON (Nodes.NodeID = CTPA.NodeID AND
                          (CTPA.AssignmentName LIKE 'hrSystemUptime%' OR CTPA.AssignmentName LIKE 'snmpEngineTime%'))
                LEFT JOIN CustomPollers CTP
                ON CTPA.CustomPollerID = CTP.CustomPollerID
                LEFT JOIN CustomPollerStatus CTPS
                ON CTPA.CustomPollerAssignmentID = CTPS.CustomPollerAssignmentID

                WHERE ((CTP.UniqueName IS NOT NULL)
                OR CAST(Nodes.SystemUpTime AS BIGINT) >0)
                AND (Nodes.Status <> 2)
                AND (DATEDIFF(HOUR,Nodes.LastSystemUpTimePollUtc,GETUTCDATE()) < 1)
                AND (Closet <> 'Printer')
                AND (Closet <> 'Cameras')
                And (NodesData_Category = 2)

                ORDER BY Seconds DESC
                )

                SELECT
                SysUptime.NodeID,
                SysUptime.caption,
                SysUptime.Seconds/86400 AS Days,
                SysUptime.Seconds/3600%24 AS Hours,
                SysUptime.Seconds/60%60 AS Mins,
                SysUptime.Seconds%60 AS Secs

                FROM SysUptime
                WHERE SysUptime.Seconds/86400 > 30

              • Re: Polling and reporting real uptime
                stripet

                Wow!  I am impressed.  Good work and thanks so much for posting this.  I was notified today that our Lastboot monitor is not working for Linux net-snmp nodes.  After investigating, I have found this solution to be the best one.  Thanks a million!