This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Help with application downtime history report

I have the current versions of NPM and SAM installed and I have found the OOB reports for node and interface downtime history. I don't see a similar report for application downtime history.

Ideally, I would like to have a report that shows each down event with a start and ending timestamp and the duration of each event grouped by server name. Also, I would want to convert the minutes to hours or days but I don't know how to do that.

Can anyone help me please?

  • We are currently in the same boat, Right now my CTO (Chief Technology Officer) is trying to justify SAM as a Downtime reporting center for Applications, Need a report that will show the weeks outage percentage per application per system, and then a monthly one and then a quarterly one.   need to tie app to server to percentage down time for the week for all our systems, being a Research/Education/Hospital facility this is becoming one of our biggest issues for us maintaining Solar winds as our network team is moving to Cisco Prime for Cisco Device Management.  As a system admin the functions of SAM are tremendous, the reporting and ability to tie multiple tables together from the web console is severely lacking.

    Any one with an idea how to do this to make the money guys happy, please speak up.

  • Anyone able to help with this report. There is no "downtime history" category for Applications like there is for Nodes.

  • No one? Please offer up if you can do this?

  • Seriously, no one on Thwack with SQL expertise has an idea?

  • Can I at least get someone with SQL expertise to tell me if this is currently possible with the Orion DB schema? Please? This is really important and I cannot seem to get anyone to answer a simple question. Thank you.

  • Seems a shame that nobody was able to jump on this one for you.  This is based on a reworked node downtime query from one of my colleagues, with some minor tweaks i just made it should do what you want.

    This is set up to work as a swql query resource on a summary page, if you paste this directly into a custom report i can only guess what might break emoticons_happy.png

    To make this searchable you just copy the same query into the box that appears when you check "Enable Search" and remove the commenting dashes from line 97

    --Application Downtime Report for Application Details View

    SELECT

      -- Device name

      StartTime.Nodes.Caption AS [Device]

      , starttime.nodes.detailsurl as [_linkfor_Device]

            ,'/Orion/images/StatusIcons/Small-' + starttime.nodes.statusicon AS [_IconFor_Device]

      -- Application name

      ,a.Name

    ,a.detailsurl as [_linkfor_Name]

    , '/Orion/images/StatusIcons/Small-' + a.StatusDescription + '.gif' as [_iconfor_Name]

      -- Down Event Message

      ,starttime.message as [Down Event Message]

      -- Down Event time stamp in local time zone

      ,ToLocal(StartTime.EventTime) AS [Down Event]

      -- Up Event Message

      ,(SELECT TOP 1

      Message AS [Message]

      FROM Orion.Events AS [EndTime]

      WHERE EndTime.EventTime >= StartTime.EventTime

      AND (EndTime.EventType = 504 -- up

      OR EndTime.EventType = 510 -- warning

      OR EndTime.EventType = 509 -- critical

      OR EndTime.EventType = 512) -- resumed

      AND EndTime.NetworkNode = StartTime.NetworkNode

      AND EndTime.NetObjectType like 'a%'

      AND EndTime.NetObjectID = StartTime.NetObjectID

      AND EventTime IS NOT NULL

      ORDER BY EndTime.EventTime

      ) AS [Up Event Message]

      -- Up Event time stamp in local time zone

      ,(SELECT TOP 1

      ToLocal(EventTime) AS [EventTime]

      FROM Orion.Events AS [EndTime]

      WHERE EndTime.EventTime >= StartTime.EventTime

      AND (EndTime.EventType = 504 -- up

      OR EndTime.EventType = 510 -- warning

      OR EndTime.EventType = 509 -- critical

      OR EndTime.EventType = 512) -- resumed

      AND EndTime.NetworkNode = StartTime.NetworkNode

      AND EndTime.NetObjectType like 'a%'

      AND EndTime.NetObjectID = StartTime.NetObjectID

      AND EventTime IS NOT NULL

      ORDER BY EndTime.EventTime

      ) AS [Up Event]

      -- Downtime duration in minutes. Based on either the current time (node is still down) or Up Event time (node is back up)

      ,CASE

      WHEN (SELECT TOP 1 ToLocal(EventTime) AS [EventTime]

      FROM Orion.Events AS [EndTime]

      WHERE EndTime.EventTime >= StartTime.EventTime

      AND (EndTime.EventType = 504 -- up

      OR EndTime.EventType = 510 -- warning

      OR EndTime.EventType = 509 -- critical

      OR EndTime.EventType = 512) -- resumed

      AND EndTime.NetworkNode = StartTime.NetworkNode

      AND EndTime.NetObjectType like 'a%'

      AND EndTime.NetObjectID = StartTime.NetObjectID

      AND EventTime IS NOT NULL

      ORDER BY EndTime.EventTime) IS NULL THEN MINUTEDIFF(StartTime.EventTime, GETUTCDATE())

      ELSE MINUTEDIFF(StartTime.EventTime,

      (SELECT TOP 1 EventTime

      FROM Orion.Events AS [EndTime]

      WHERE EndTime.EventTime > StartTime.EventTime

      AND (EndTime.EventType = 504 -- up

      OR EndTime.EventType = 510 -- warning

      OR EndTime.EventType = 509 -- critical

      OR EndTime.EventType = 512) -- resumed

      AND EndTime.NetworkNode = StartTime.NetworkNode

      AND EndTime.NetObjectType like 'a%'

      AND EndTime.NetObjectID = StartTime.NetObjectID

      ORDER BY EndTime.EventTime))

      END AS [Minutes]

    -- This is the table we are querying

    FROM Orion.Events StartTime

    join orion.apm.application a on a.applicationid=starttime.netobjectid

    -- EventType = 1 is to correlate with our 'Down Event' time stamp from line 13 above

    WHERE (StartTime.EventType = 505 -- down and unknown

    or StartTime.EventType = 513 -- unreachable

    or StartTime.EventType = 511) -- suspended

    and StartTime.NetObjectType like 'a%'

    and daydiff(starttime.eventtime,getdate())<31

    --remove the comment from this part to make this searchable in a custom query resource

    -- and ( StartTime.Nodes.Caption like '%${SEARCH_STRING}%' or a.Name like '%${SEARCH_STRING}%')

    -- Order our results from the newest to oldest 'Down Event'

    ORDER BY [Down Event] DESC

    -Marc Netterfield

        Loop1 Systems: SolarWinds Training and Professional Services

  • This is great Marc! Thanks so much.

    Let me make sure I understand the parameters of this version and then ask

    how to make a couple of on the fly edits if needed:

    As I understand it:

    1. The query shows the start and end time of each event along with the

    number of minutes of duration, correct?

    2. Does the query cover the last 31 days as a rolling window, or is it

    the last calendar month?

    3. Does the query count ONLY application down as the event type?

    Now, how would I:

    1. Change the time period to the last 24 hours, or the last 7 days?

    2. Group the data by device?

    3. Hide the Down and Up Event Message?

    Brett Organ

    Senior Consulting Engineer and Trainer

    Corona Technical Services

    918-398-8052 x106

  • It is a bit complicated but it is looking for a series of down type events (down, unknown, unreachable, suspended as commented in lines 88-90) and then looks for the next event that could imply the status has improved from those (as commented in line 25-28) and then it figures how long between those events. It is honestly a bit of a guess since there are edge scenarios that could potentially confuse the logic it uses.  That is the reason why I think its risky to not include the down and up messages, I don't trust it 100% unless I can look at those and the transitions and timelines make sense.

    Line 94 defines how far it looks back, in this case a rolling 31 days.  changing it to 24 hours would mean changing that line to

    1. and hourdiff(starttime.eventtime,getdate())<24 

    The way it is written your best bet to group it would be changing line 100 to

         order by [Device] asc, [Down Event] desc

    To hide the messages comment out (--) lines 13 and 21-34

    -Marc Netterfield

        Loop1 Systems: SolarWinds Training and Professional Services