13 Replies Latest reply on Jul 25, 2016 5:03 PM by zackm

    Help with application downtime history report

    patriot

      I have the current versions of NPM and SAM installed and I have found the OOB reports for node and interface downtime history. I don't see a similar report for application downtime history.

       

      Ideally, I would like to have a report that shows each down event with a start and ending timestamp and the duration of each event grouped by server name. Also, I would want to convert the minutes to hours or days but I don't know how to do that.

       

      Can anyone help me please?

        • Re: Help with application downtime history report
          jmmartin

          We are currently in the same boat, Right now my CTO (Chief Technology Officer) is trying to justify SAM as a Downtime reporting center for Applications, Need a report that will show the weeks outage percentage per application per system, and then a monthly one and then a quarterly one.   need to tie app to server to percentage down time for the week for all our systems, being a Research/Education/Hospital facility this is becoming one of our biggest issues for us maintaining Solar winds as our network team is moving to Cisco Prime for Cisco Device Management.  As a system admin the functions of SAM are tremendous, the reporting and ability to tie multiple tables together from the web console is severely lacking.

           

          Any one with an idea how to do this to make the money guys happy, please speak up.

          • Re: Help with application downtime history report
            patriot

            No one? Please offer up if you can do this?

            • Re: Help with application downtime history report
              patriot

              Can I at least get someone with SQL expertise to tell me if this is currently possible with the Orion DB schema? Please? This is really important and I cannot seem to get anyone to answer a simple question. Thank you.

              • Re: Help with application downtime history report
                mesverrum

                Seems a shame that nobody was able to jump on this one for you.  This is based on a reworked node downtime query from one of my colleagues, with some minor tweaks i just made it should do what you want.

                 

                This is set up to work as a swql query resource on a summary page, if you paste this directly into a custom report i can only guess what might break

                To make this searchable you just copy the same query into the box that appears when you check "Enable Search" and remove the commenting dashes from line 97

                 

                 

                --Application Downtime Report for Application Details View
                
                SELECT
                  -- Device name
                  StartTime.Nodes.Caption AS [Device]
                  , starttime.nodes.detailsurl as [_linkfor_Device]
                        ,'/Orion/images/StatusIcons/Small-' + starttime.nodes.statusicon AS [_IconFor_Device]
                  -- Application name
                  ,a.Name
                ,a.detailsurl as [_linkfor_Name]
                , '/Orion/images/StatusIcons/Small-' + a.StatusDescription + '.gif' as [_iconfor_Name]
                  -- Down Event Message
                  ,starttime.message as [Down Event Message]
                
                
                  -- Down Event time stamp in local time zone
                  ,ToLocal(StartTime.EventTime) AS [Down Event]
                
                
                  -- Up Event Message
                  ,(SELECT TOP 1
                  Message AS [Message]
                  FROM Orion.Events AS [EndTime]
                  WHERE EndTime.EventTime >= StartTime.EventTime
                  AND (EndTime.EventType = 504 -- up
                  OR EndTime.EventType = 510 -- warning
                  OR EndTime.EventType = 509 -- critical
                  OR EndTime.EventType = 512) -- resumed
                  AND EndTime.NetworkNode = StartTime.NetworkNode
                  AND EndTime.NetObjectType like 'a%'
                  AND EndTime.NetObjectID = StartTime.NetObjectID
                  AND EventTime IS NOT NULL
                  ORDER BY EndTime.EventTime
                  ) AS [Up Event Message]
                
                
                  -- Up Event time stamp in local time zone
                  ,(SELECT TOP 1
                  ToLocal(EventTime) AS [EventTime]
                  FROM Orion.Events AS [EndTime]
                  WHERE EndTime.EventTime >= StartTime.EventTime
                  AND (EndTime.EventType = 504 -- up
                  OR EndTime.EventType = 510 -- warning
                  OR EndTime.EventType = 509 -- critical
                  OR EndTime.EventType = 512) -- resumed
                  AND EndTime.NetworkNode = StartTime.NetworkNode
                  AND EndTime.NetObjectType like 'a%'
                  AND EndTime.NetObjectID = StartTime.NetObjectID
                  AND EventTime IS NOT NULL
                  ORDER BY EndTime.EventTime
                  ) AS [Up Event]
                
                  -- Downtime duration in minutes. Based on either the current time (node is still down) or Up Event time (node is back up)
                  ,CASE
                  WHEN (SELECT TOP 1 ToLocal(EventTime) AS [EventTime]
                  FROM Orion.Events AS [EndTime]
                  WHERE EndTime.EventTime >= StartTime.EventTime
                  AND (EndTime.EventType = 504 -- up
                  OR EndTime.EventType = 510 -- warning
                  OR EndTime.EventType = 509 -- critical
                  OR EndTime.EventType = 512) -- resumed
                  AND EndTime.NetworkNode = StartTime.NetworkNode
                  AND EndTime.NetObjectType like 'a%'
                  AND EndTime.NetObjectID = StartTime.NetObjectID
                  AND EventTime IS NOT NULL
                  ORDER BY EndTime.EventTime) IS NULL THEN MINUTEDIFF(StartTime.EventTime, GETUTCDATE())
                  ELSE MINUTEDIFF(StartTime.EventTime,
                  (SELECT TOP 1 EventTime
                  FROM Orion.Events AS [EndTime]
                  WHERE EndTime.EventTime > StartTime.EventTime
                  AND (EndTime.EventType = 504 -- up
                  OR EndTime.EventType = 510 -- warning
                  OR EndTime.EventType = 509 -- critical
                  OR EndTime.EventType = 512) -- resumed
                  AND EndTime.NetworkNode = StartTime.NetworkNode
                  AND EndTime.NetObjectType like 'a%'
                  AND EndTime.NetObjectID = StartTime.NetObjectID
                  ORDER BY EndTime.EventTime))
                  END AS [Minutes]
                
                
                -- This is the table we are querying
                FROM Orion.Events StartTime
                join orion.apm.application a on a.applicationid=starttime.netobjectid
                
                
                -- EventType = 1 is to correlate with our 'Down Event' time stamp from line 13 above
                WHERE (StartTime.EventType = 505 -- down and unknown
                or StartTime.EventType = 513 -- unreachable
                or StartTime.EventType = 511) -- suspended
                
                
                and StartTime.NetObjectType like 'a%'
                and daydiff(starttime.eventtime,getdate())<31
                
                --remove the comment from this part to make this searchable in a custom query resource
                -- and ( StartTime.Nodes.Caption like '%${SEARCH_STRING}%' or a.Name like '%${SEARCH_STRING}%')
                
                -- Order our results from the newest to oldest 'Down Event'
                ORDER BY [Down Event] DESC
                

                 

                 

                -Marc Netterfield

                    Loop1 Systems: SolarWinds Training and Professional Services

                  • Re: Help with application downtime history report
                    borgan

                    This is great Marc! Thanks so much.

                     

                     

                     

                    Let me make sure I understand the parameters of this version and then ask

                    how to make a couple of on the fly edits if needed:

                     

                     

                     

                    As I understand it:

                     

                     

                     

                    1.     The query shows the start and end time of each event along with the

                    number of minutes of duration, correct?

                     

                    2.     Does the query cover the last 31 days as a rolling window, or is it

                    the last calendar month?

                     

                    3.     Does the query count ONLY application down as the event type?

                     

                     

                     

                    Now, how would I:

                     

                     

                     

                    1.     Change the time period to the last 24 hours, or the last 7 days?

                     

                    2.     Group the data by device?

                     

                    3.     Hide the Down and Up Event Message?

                     

                     

                     

                     

                     

                    Brett Organ

                     

                    Senior Consulting Engineer and Trainer

                     

                    Corona Technical Services

                     

                    918-398-8052 x106

                      • Re: Help with application downtime history report
                        mesverrum

                        It is a bit complicated but it is looking for a series of down type events (down, unknown, unreachable, suspended as commented in lines 88-90) and then looks for the next event that could imply the status has improved from those (as commented in line 25-28) and then it figures how long between those events. It is honestly a bit of a guess since there are edge scenarios that could potentially confuse the logic it uses.  That is the reason why I think its risky to not include the down and up messages, I don't trust it 100% unless I can look at those and the transitions and timelines make sense.

                         

                        Line 94 defines how far it looks back, in this case a rolling 31 days.  changing it to 24 hours would mean changing that line to

                        1. and hourdiff(starttime.eventtime,getdate())<24 

                         

                        The way it is written your best bet to group it would be changing line 100 to

                             order by [Device] asc, [Down Event] desc

                         

                        To hide the messages comment out (--) lines 13 and 21-34

                         

                        -Marc Netterfield

                            Loop1 Systems: SolarWinds Training and Professional Services

                        • Re: Help with application downtime history report
                          zackm

                          Hi Brett!

                           

                          It looks like you've got some great questions about how to utilize SQL and SWQL syntax in order to provide a more robust and customized experience with the SolarWinds Orion product suite.

                           

                          You might want to take a look at our Public Training 301 course where we deep-dive into not only the basics of SQL architecture best practices and query syntax, but also how to leverage SWQL and the SolarWinds SDK as well as SQL Server Reporting Services (SSRS) to really take your expertise to the next level. It is a heck of a lot of fun, plus you get to hang out with us in Austin with great food, sights, and music! As a long time user/admin/guru in the SolarWinds world, I strongly support and encourage anyone's interest in these technologies as it really just makes the whole experience better.

                           

                          If you, or anyone else you know, are interested, feel free to reach out to me either here in Thwack or via my contact info below.

                           

                          Thanks for the kind words to mesverrum as well, he's one of our best!

                           

                          -ZackM

                          Loop1 Systems: SolarWinds Training and Professional Services