12 Replies Latest reply on Oct 8, 2019 6:30 AM by mesverrum

    Need a resource that shows down nodes and Down time

    casey.carson

      I have a resource on my summary page that shows all of my down servers, but have just been asked to modify it so that it shows how long those devices have been down.

       

      So far I have a real simple view.

       

      Any help will be appreciated. Just need a new column that says how long it has been down.

       

        • Re: Need a resource that shows down nodes and Down time
          christopher.t.jones123

          casey.carson take a look at this thwack post by mesverrum, it's a really nice resource and sounds like exactly what you need

          Node Downtime with Duration and Minimum Length Filtering

            • Re: Need a resource that shows down nodes and Down time
              casey.carson

              I tried that report and it was not showing just my down nodes. Unfortunately, I do not know SQL well enough to make this work for my needs. I really just need the report above with a column showing how long it has been down.

               

              Thank you!

                • Re: Need a resource that shows down nodes and Down time
                  christopher.t.jones123

                  can you share the query or a screenshot of how you are currently capturing this? or if you're willing can you try the below-modified query from the mentioned article, it should just target down devices. Notice i've changed the final "WHERE" statement to just look for node status = 2 (which is down)

                   

                  select n.caption as [Device] 
                  -- shows the current status icon 
                  , '/Orion/images/StatusIcons/Small-' + n.StatusIcon AS [_IconFor_Device] 
                  -- makes a clickable link to the node details 
                  , n.DetailsUrl as [_linkfor_Device] 
                  -- shows the timestamp of the down event, if there is no timestamp then is says the event was greater than the number of days in your event retention settings 
                  , isnull(tostring(t2.[Down Event]),concat('Greater than ',(SELECT CurrentValue FROM Orion.Settings where settingid='SWNetPerfMon-Settings-Retain Events'),' days ago')) as [Down Event] 
                  -- shows the timestamp of the up event, unless the object is still down 
                  , isnull(tostring(t2.[Up Event]),'Still Down') as [Up Event] 
                  -- figures out the minutes between the down and up events, if the object is still down it counts from the down event to now, displays 99999 if we cannot accurately determine the original downtime, and 
                  , isnull(MINUTEDIFF(t2.[Down Event], isnull(t2.[Up Event],GETUTCDATE())),99999) as Minutes 
                   
                   
                  from orion.nodes n 
                  left join (SELECT     
                  -- Device nodeid used for our join   
                  StartTime.Nodes.NodeID     
                   
                  -- Down Event time stamp in local time zone     
                  ,ToLocal(StartTime.EventTime) AS [Down Event]     
                     
                  -- Up Event time stamp in local time zone     
                  ,(SELECT TOP 1     
                  ToLocal(EventTime) AS [EventTime]     
                  FROM Orion.Events AS [EndTime]     
                  -- picks the first up event that is newer than the down event for this node 
                  WHERE EndTime.EventTime >= StartTime.EventTime   
                  -- EventType 5 is a node up 
                  AND EndTime.EventType = 5     
                  AND EndTime.NetObjectID = StartTime.NetObjectID     
                  AND EventTime IS NOT NULL     
                  ORDER BY EndTime.EventTime     
                  ) AS [Up Event]     
                     
                  -- This is the table we are querying     
                  FROM Orion.Events StartTime     
                     
                  -- EventType 1 is a node down 
                  WHERE StartTime.EventType = 1     
                       
                  ) t2 on n.NodeID = t2.nodeid 
                   
                   
                  -- this is how I catch nodes that are down but have aged out of the events table 
                  where n.status = 2
                   
                   
                  -- If you want to filter the results to only show outages of a minimum duration uncomment the below line 
                  --and MINUTEDIFF(isnull(t2.[Down Event],(GETUTCDATE()-30)), isnull(t2.[Up Event],GETUTCDATE())) >  60 
                   
                   
                  -- if you want to use this query in a search box of the Custom Query resource uncomment the below line 
                  --and n.Caption like '%${SEARCH_STRING}%' 
                   
                   
                  order by t2.[down event] desc 
                    • Re: Need a resource that shows down nodes and Down time
                      casey.carson

                      Thank you for your help! That gave me some great data, but it is listing several devices multiple times. If we can get this to just show the devices that are showing "Still Down" and target only devices from custom property called Notification Group. The 3 groups I want in the report are called Server Group, Linux Group, Hosted Technology.

                       

                       

                      Here is what I am using now:

                       

                        • Re: Need a resource that shows down nodes and Down time
                          christopher.t.jones123

                          try this query

                           

                          select n.caption as [Device]  
                          -- shows the current status icon  
                          , '/Orion/images/StatusIcons/Small-' + n.StatusIcon AS [_IconFor_Device]  
                          -- makes a clickable link to the node details  
                          , n.DetailsUrl as [_linkfor_Device]
                          --shows Custom Property "Notification Group"
                          , n.CustomProperties.Notification_Group as [Notification Group]
                          -- shows the timestamp of the down event, if there is no timestamp then is says the event was greater than the number of days in your event retention settings  
                          , isnull(tostring(t2.[Down Event]),concat('Greater than ',(SELECT CurrentValue FROM Orion.Settings where settingid='SWNetPerfMon-Settings-Retain Events'),' days ago')) as [Down Event]  
                          -- shows the timestamp of the up event, unless the object is still down  
                          , isnull(tostring(t2.[Up Event]),'Still Down') as [Up Event]  
                          -- figures out the minutes between the down and up events, if the object is still down it counts from the down event to now, displays 99999 if we cannot accurately determine the original downtime, and   
                          , isnull(MINUTEDIFF(t2.[Down Event], isnull(t2.[Up Event],GETUTCDATE())),99999) as Minutes  
                            
                            
                          from orion.nodes n  
                          left join (SELECT      
                           -- Device nodeid used for our join     
                           StartTime.Nodes.NodeID       
                            
                           -- Down Event time stamp in local time zone      
                           ,ToLocal(StartTime.EventTime) AS [Down Event]      
                              
                           -- Up Event time stamp in local time zone      
                           ,(SELECT TOP 1      
                           ToLocal(EventTime) AS [EventTime]      
                           FROM Orion.Events AS [EndTime]      
                           -- picks the first up event that is newer than the down event for this node  
                            WHERE EndTime.EventTime >= StartTime.EventTime     
                           -- EventType 5 is a node up   
                            And endtime.eventtype = 5
                            AND EndTime.NetObjectID = StartTime.NetObjectID      
                            AND EventTime IS NOT NULL      
                            ORDER BY EndTime.EventTime      
                            ) AS [Up Event]
                          -- This is the table we are querying      
                          FROM Orion.Events StartTime      
                              
                          -- EventType 1 is a node down  
                          WHERE StartTime.EventType = 1      
                                
                          ) t2 on n.NodeID = t2.nodeid  
                            
                            
                          -- this is how I catch nodes that are down but have aged out of the events table  
                          where (n.status = 2 and [Up Event] is NULL and (n.CustomProperties.Notification_Group = 'Server Group' or n.CustomProperties.Notification_Group = 'Linux Group' or n.CustomProperties.Notification_Group = 'Hosted Technology'))
                            
                            
                          -- If you want to filter the results to only show outages of a minimum duration uncomment the below line  
                          --and MINUTEDIFF(isnull(t2.[Down Event],(GETUTCDATE()-30)), isnull(t2.[Up Event],GETUTCDATE())) >  60  
                            
                            
                          -- if you want to use this query in a search box of the Custom Query resource uncomment the below line  
                          --and n.Caption like '%${SEARCH_STRING}%'  
                            
                            
                          order by t2.[down event] desc 
                          
                          1 of 1 people found this helpful
                  • Re: Need a resource that shows down nodes and Down time
                    bobmarley

                    Here is a simple SWQL

                     

                     

                    SELECT TOP 1000 IPAddress, Caption, StatusDescription

                    FROM Orion.Nodes

                    Where StatusDescription LIKE 'Node Status Is Down.'

                    • Re: Need a resource that shows down nodes and Down time
                      jtimes

                      Is there a way to just show the currently down devices with this SWQL?