13 Replies Latest reply on Jul 25, 2016 5:03 PM by zackm

    Help with application downtime history report

    patriot

      I have the current versions of NPM and SAM installed and I have found the OOB reports for node and interface downtime history. I don't see a similar report for application downtime history.

       

      Ideally, I would like to have a report that shows each down event with a start and ending timestamp and the duration of each event grouped by server name. Also, I would want to convert the minutes to hours or days but I don't know how to do that.

       

      Can anyone help me please?

        • Re: Help with application downtime history report
          patriot

          Well? Hmmmm? Thanks.

          • Re: Help with application downtime history report
            jmmartin

            We are currently in the same boat, Right now my CTO (Chief Technology Officer) is trying to justify SAM as a Downtime reporting center for Applications, Need a report that will show the weeks outage percentage per application per system, and then a monthly one and then a quarterly one.   need to tie app to server to percentage down time for the week for all our systems, being a Research/Education/Hospital facility this is becoming one of our biggest issues for us maintaining Solar winds as our network team is moving to Cisco Prime for Cisco Device Management.  As a system admin the functions of SAM are tremendous, the reporting and ability to tie multiple tables together from the web console is severely lacking.

             

            Any one with an idea how to do this to make the money guys happy, please speak up.

            • Re: Help with application downtime history report
              patriot

              No one? Please offer up if you can do this?

              • Re: Help with application downtime history report
                patriot

                Can I at least get someone with SQL expertise to tell me if this is currently possible with the Orion DB schema? Please? This is really important and I cannot seem to get anyone to answer a simple question. Thank you.

                • Re: Help with application downtime history report
                  mesverrum

                  Seems a shame that nobody was able to jump on this one for you.  This is based on a reworked node downtime query from one of my colleagues, with some minor tweaks i just made it should do what you want.

                   

                  This is set up to work as a swql query resource on a summary page, if you paste this directly into a custom report i can only guess what might break

                  To make this searchable you just copy the same query into the box that appears when you check "Enable Search" and remove the commenting dashes from line 97

                   

                   

                  --Application Downtime Report for Application Details View
                  
                  SELECT
                    -- Device name
                    StartTime.Nodes.Caption AS [Device]
                    , starttime.nodes.detailsurl as [_linkfor_Device]
                          ,'/Orion/images/StatusIcons/Small-' + starttime.nodes.statusicon AS [_IconFor_Device]
                    -- Application name
                    ,a.Name
                  ,a.detailsurl as [_linkfor_Name]
                  , '/Orion/images/StatusIcons/Small-' + a.StatusDescription + '.gif' as [_iconfor_Name]
                    -- Down Event Message
                    ,starttime.message as [Down Event Message]
                  
                  
                    -- Down Event time stamp in local time zone
                    ,ToLocal(StartTime.EventTime) AS [Down Event]
                  
                  
                    -- Up Event Message
                    ,(SELECT TOP 1
                    Message AS [Message]
                    FROM Orion.Events AS [EndTime]
                    WHERE EndTime.EventTime >= StartTime.EventTime
                    AND (EndTime.EventType = 504 -- up
                    OR EndTime.EventType = 510 -- warning
                    OR EndTime.EventType = 509 -- critical
                    OR EndTime.EventType = 512) -- resumed
                    AND EndTime.NetworkNode = StartTime.NetworkNode
                    AND EndTime.NetObjectType like 'a%'
                    AND EndTime.NetObjectID = StartTime.NetObjectID
                    AND EventTime IS NOT NULL
                    ORDER BY EndTime.EventTime
                    ) AS [Up Event Message]
                  
                  
                    -- Up Event time stamp in local time zone
                    ,(SELECT TOP 1
                    ToLocal(EventTime) AS [EventTime]
                    FROM Orion.Events AS [EndTime]
                    WHERE EndTime.EventTime >= StartTime.EventTime
                    AND (EndTime.EventType = 504 -- up
                    OR EndTime.EventType = 510 -- warning
                    OR EndTime.EventType = 509 -- critical
                    OR EndTime.EventType = 512) -- resumed
                    AND EndTime.NetworkNode = StartTime.NetworkNode
                    AND EndTime.NetObjectType like 'a%'
                    AND EndTime.NetObjectID = StartTime.NetObjectID
                    AND EventTime IS NOT NULL
                    ORDER BY EndTime.EventTime
                    ) AS [Up Event]
                  
                    -- Downtime duration in minutes. Based on either the current time (node is still down) or Up Event time (node is back up)
                    ,CASE
                    WHEN (SELECT TOP 1 ToLocal(EventTime) AS [EventTime]
                    FROM Orion.Events AS [EndTime]
                    WHERE EndTime.EventTime >= StartTime.EventTime
                    AND (EndTime.EventType = 504 -- up
                    OR EndTime.EventType = 510 -- warning
                    OR EndTime.EventType = 509 -- critical
                    OR EndTime.EventType = 512) -- resumed
                    AND EndTime.NetworkNode = StartTime.NetworkNode
                    AND EndTime.NetObjectType like 'a%'
                    AND EndTime.NetObjectID = StartTime.NetObjectID
                    AND EventTime IS NOT NULL
                    ORDER BY EndTime.EventTime) IS NULL THEN MINUTEDIFF(StartTime.EventTime, GETUTCDATE())
                    ELSE MINUTEDIFF(StartTime.EventTime,
                    (SELECT TOP 1 EventTime
                    FROM Orion.Events AS [EndTime]
                    WHERE EndTime.EventTime > StartTime.EventTime
                    AND (EndTime.EventType = 504 -- up
                    OR EndTime.EventType = 510 -- warning
                    OR EndTime.EventType = 509 -- critical
                    OR EndTime.EventType = 512) -- resumed
                    AND EndTime.NetworkNode = StartTime.NetworkNode
                    AND EndTime.NetObjectType like 'a%'
                    AND EndTime.NetObjectID = StartTime.NetObjectID
                    ORDER BY EndTime.EventTime))
                    END AS [Minutes]
                  
                  
                  -- This is the table we are querying
                  FROM Orion.Events StartTime
                  join orion.apm.application a on a.applicationid=starttime.netobjectid
                  
                  
                  -- EventType = 1 is to correlate with our 'Down Event' time stamp from line 13 above
                  WHERE (StartTime.EventType = 505 -- down and unknown
                  or StartTime.EventType = 513 -- unreachable
                  or StartTime.EventType = 511) -- suspended
                  
                  
                  and StartTime.NetObjectType like 'a%'
                  and daydiff(starttime.eventtime,getdate())<31
                  
                  --remove the comment from this part to make this searchable in a custom query resource
                  -- and ( StartTime.Nodes.Caption like '%${SEARCH_STRING}%' or a.Name like '%${SEARCH_STRING}%')
                  
                  -- Order our results from the newest to oldest 'Down Event'
                  ORDER BY [Down Event] DESC
                  

                   

                   

                  -Marc Netterfield

                      Loop1 Systems: SolarWinds Training and Professional Services

                    • Re: Help with application downtime history report
                      borgan

                      This is great Marc! Thanks so much.

                       

                       

                       

                      Let me make sure I understand the parameters of this version and then ask

                      how to make a couple of on the fly edits if needed:

                       

                       

                       

                      As I understand it:

                       

                       

                       

                      1.     The query shows the start and end time of each event along with the

                      number of minutes of duration, correct?

                       

                      2.     Does the query cover the last 31 days as a rolling window, or is it

                      the last calendar month?

                       

                      3.     Does the query count ONLY application down as the event type?

                       

                       

                       

                      Now, how would I:

                       

                       

                       

                      1.     Change the time period to the last 24 hours, or the last 7 days?

                       

                      2.     Group the data by device?

                       

                      3.     Hide the Down and Up Event Message?

                       

                       

                       

                       

                       

                      Brett Organ

                       

                      Senior Consulting Engineer and Trainer

                       

                      Corona Technical Services

                       

                      918-398-8052 x106

                        • Re: Help with application downtime history report
                          mesverrum

                          It is a bit complicated but it is looking for a series of down type events (down, unknown, unreachable, suspended as commented in lines 88-90) and then looks for the next event that could imply the status has improved from those (as commented in line 25-28) and then it figures how long between those events. It is honestly a bit of a guess since there are edge scenarios that could potentially confuse the logic it uses.  That is the reason why I think its risky to not include the down and up messages, I don't trust it 100% unless I can look at those and the transitions and timelines make sense.

                           

                          Line 94 defines how far it looks back, in this case a rolling 31 days.  changing it to 24 hours would mean changing that line to

                          1. and hourdiff(starttime.eventtime,getdate())<24 

                           

                          The way it is written your best bet to group it would be changing line 100 to

                               order by [Device] asc, [Down Event] desc

                           

                          To hide the messages comment out (--) lines 13 and 21-34

                           

                          -Marc Netterfield

                              Loop1 Systems: SolarWinds Training and Professional Services

                          • Re: Help with application downtime history report
                            zackm

                            Hi Brett!

                             

                            It looks like you've got some great questions about how to utilize SQL and SWQL syntax in order to provide a more robust and customized experience with the SolarWinds Orion product suite.

                             

                            You might want to take a look at our Public Training 301 course where we deep-dive into not only the basics of SQL architecture best practices and query syntax, but also how to leverage SWQL and the SolarWinds SDK as well as SQL Server Reporting Services (SSRS) to really take your expertise to the next level. It is a heck of a lot of fun, plus you get to hang out with us in Austin with great food, sights, and music! As a long time user/admin/guru in the SolarWinds world, I strongly support and encourage anyone's interest in these technologies as it really just makes the whole experience better.

                             

                            If you, or anyone else you know, are interested, feel free to reach out to me either here in Thwack or via my contact info below.

                             

                            Thanks for the kind words to mesverrum as well, he's one of our best!

                             

                            -ZackM

                            Loop1 Systems: SolarWinds Training and Professional Services