cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 7

Is there a way to add minutes to availability on a report?

Jump to solution

I have a report i've made with nodes and availability for the month in a percentage but i desperately need to know how many minutes that month the site was not up.  I've gone through everything i can see in the web report designer and cannot find a way to do this. it seems like it should be very simple so maybe i'm missing something.  Can anyone help?  Thank you.

0 Kudos
1 Solution

This custom SWQL query sums up the durations of the recorded outages over the last 31 days, you are free to get more fancy with the time range logic if you like.  I should also mention this query times out in larger environments so I'd probably take a different approach if I needed this data for someone with 10,000 servers, probably just create new tables that tracked the relevant data as it gets entered in with a format that was easier on the DB to calculate.

 

select a.node.Caption as Node
, a.node.DetailsUrl as [_linkfor_Node]
, a.name as Application
, a.DetailsUrl as [_linkfor_Application]
, sum(ev.minutes) as Down_Duration_Last_31_Days

from orion.apm.Application a
join (
--Application Downtime Report for Application Details View 
SELECT 

 -- Device name 
 StartTime.Nodes.Caption AS [Device] 
  
 -- Application name 
 ,a.name 
 , a.ApplicationID
 -- Down Event Message 
 ,starttime.message as [Down Event Message] 
 
 -- Down Event time stamp in local time zone 
 ,ToLocal(StartTime.EventTime) AS [Down Event] 
 
 -- Up Event Message 
 ,(SELECT TOP 1  
  Message AS [Message]  
 FROM Orion.Events AS [EndTime] 
  WHERE EndTime.EventTime >= StartTime.EventTime 
  AND (EndTime.EventType = 504 -- up 
  OR EndTime.EventType = 510 -- warning 
  OR EndTime.EventType = 509 -- critical 
  OR EndTime.EventType = 512) -- resumed 
  AND EndTime.NetworkNode = StartTime.NetworkNode 
  AND EndTime.NetObjectType like 'a%' 
  AND EndTime.NetObjectID = StartTime.NetObjectID 
  AND EventTime IS NOT NULL 
 ORDER BY EndTime.EventTime 
 ) AS [Up Event Message] 
 
 -- Up Event time stamp in local time zone 
 ,(SELECT TOP 1  
  ToLocal(EventTime) AS [EventTime]  
 FROM Orion.Events AS [EndTime] 
  WHERE EndTime.EventTime >= StartTime.EventTime 
  AND (EndTime.EventType = 504 -- up 
  OR EndTime.EventType = 510 -- warning 
  OR EndTime.EventType = 509 -- critical 
  OR EndTime.EventType = 512) -- resumed 
  AND EndTime.NetworkNode = StartTime.NetworkNode 
  AND EndTime.NetObjectType like 'a%' 
  AND EndTime.NetObjectID = StartTime.NetObjectID 
  AND EventTime IS NOT NULL 
 ORDER BY EndTime.EventTime 
 ) AS [Up Event] 
  
 -- Downtime duration in minutes. Based on either the current time (node is still down) or Up Event time (node is back up) 
 ,CASE 
  WHEN (SELECT TOP 1 ToLocal(EventTime) AS [EventTime]  
   FROM Orion.Events AS [EndTime] 
   WHERE EndTime.EventTime >= StartTime.EventTime 
   AND (EndTime.EventType = 504 -- up 
   OR EndTime.EventType = 510 -- warning 
   OR EndTime.EventType = 509 -- critical 
   OR EndTime.EventType = 512) -- resumed 
   AND EndTime.NetworkNode = StartTime.NetworkNode 
   AND EndTime.NetObjectType like 'a%' 
   AND EndTime.NetObjectID = StartTime.NetObjectID 
   AND EventTime IS NOT NULL 
   ORDER BY EndTime.EventTime) IS NULL THEN MINUTEDIFF(StartTime.EventTime, GETUTCDATE()) 
  ELSE MINUTEDIFF(StartTime.EventTime,  
   (SELECT TOP 1 EventTime  
   FROM Orion.Events AS [EndTime] 
   WHERE EndTime.EventTime > StartTime.EventTime 
   AND (EndTime.EventType = 504 -- up 
   OR EndTime.EventType = 510 -- warning 
   OR EndTime.EventType = 509 -- critical 
   OR EndTime.EventType = 512) -- resumed 
   AND EndTime.NetworkNode = StartTime.NetworkNode 
   AND EndTime.NetObjectType like 'a%' 
   AND EndTime.NetObjectID = StartTime.NetObjectID 
   ORDER BY EndTime.EventTime)) 
 END AS [Minutes] 
  
  
-- This is the table we are querying  
FROM Orion.Events StartTime 
join orion.apm.application a on a.applicationid=starttime.netobjectid 
 
-- EventType = 1 is to correlate with our 'Down Event' time stamp from line 13 above 
WHERE ((StartTime.EventType = 505 and StartTime.message not like '%unknown%')-- 505 is down and unknown, unknowns are fairly common 
or StartTime.EventType = 513 -- unreachable 
or StartTime.EventType = 511) -- suspended 
 
-- Here's the trick to get the query to dynamically pull details for the node you are looking at 
AND StartTime.NetObjectType like 'a%' 
and daydiff(starttime.eventtime,getdate())<31 
 
-- Order our results from the newest to oldest 'Down Event' 
ORDER BY [Down Event] DESC
) ev on a.applicationid = ev.applicationid

group by a.node.Caption, a.node.DetailsUrl, a.name, a.DetailsUrl
- Marc Netterfield, Github

View solution in original post

0 Kudos
6 Replies
Level 20

There's a downtime with duration report writer report I used to use all the time on NPM.  I'm not sure where it is off the top of my head either though.  I really liked it too.

0 Kudos

I found a downtime report but it listed every node multiple times and I have to parse all the info in excel.  I was hoping we could have one that added then all together for the month like:

Site1     98% availibity    Down 127 minutes.

 

the report i have for downtime that does have minutes on it lists it like this.

 

Site 1       Down 3 minutes

Site 2       down 4 minutes

Site 1       down 2 minutes

etc...  It will repeat sites and requires someone to parse it in excel to get the data.

 

If there is one that adds them all togther that would be fantastic. 

0 Kudos
So when it comes to the gui driven parts of the report writer it will only show you data points that are recorded in the database, and unfortunately the duration of a down period is not recorded, just the timestamps when Orion saw the device go down or up. (Technically there it's a downtime table but when I've tried to use it the numbers were wildly wrong so I don't rely on it). There are a few sql/swql reports on thwack that do the math of comparing down and up event timestamps so we can get the outage duration, I'll dig around and find the link to one here in a moment.
- Marc Netterfield, Github
0 Kudos

This custom SWQL query sums up the durations of the recorded outages over the last 31 days, you are free to get more fancy with the time range logic if you like.  I should also mention this query times out in larger environments so I'd probably take a different approach if I needed this data for someone with 10,000 servers, probably just create new tables that tracked the relevant data as it gets entered in with a format that was easier on the DB to calculate.

 

select a.node.Caption as Node
, a.node.DetailsUrl as [_linkfor_Node]
, a.name as Application
, a.DetailsUrl as [_linkfor_Application]
, sum(ev.minutes) as Down_Duration_Last_31_Days

from orion.apm.Application a
join (
--Application Downtime Report for Application Details View 
SELECT 

 -- Device name 
 StartTime.Nodes.Caption AS [Device] 
  
 -- Application name 
 ,a.name 
 , a.ApplicationID
 -- Down Event Message 
 ,starttime.message as [Down Event Message] 
 
 -- Down Event time stamp in local time zone 
 ,ToLocal(StartTime.EventTime) AS [Down Event] 
 
 -- Up Event Message 
 ,(SELECT TOP 1  
  Message AS [Message]  
 FROM Orion.Events AS [EndTime] 
  WHERE EndTime.EventTime >= StartTime.EventTime 
  AND (EndTime.EventType = 504 -- up 
  OR EndTime.EventType = 510 -- warning 
  OR EndTime.EventType = 509 -- critical 
  OR EndTime.EventType = 512) -- resumed 
  AND EndTime.NetworkNode = StartTime.NetworkNode 
  AND EndTime.NetObjectType like 'a%' 
  AND EndTime.NetObjectID = StartTime.NetObjectID 
  AND EventTime IS NOT NULL 
 ORDER BY EndTime.EventTime 
 ) AS [Up Event Message] 
 
 -- Up Event time stamp in local time zone 
 ,(SELECT TOP 1  
  ToLocal(EventTime) AS [EventTime]  
 FROM Orion.Events AS [EndTime] 
  WHERE EndTime.EventTime >= StartTime.EventTime 
  AND (EndTime.EventType = 504 -- up 
  OR EndTime.EventType = 510 -- warning 
  OR EndTime.EventType = 509 -- critical 
  OR EndTime.EventType = 512) -- resumed 
  AND EndTime.NetworkNode = StartTime.NetworkNode 
  AND EndTime.NetObjectType like 'a%' 
  AND EndTime.NetObjectID = StartTime.NetObjectID 
  AND EventTime IS NOT NULL 
 ORDER BY EndTime.EventTime 
 ) AS [Up Event] 
  
 -- Downtime duration in minutes. Based on either the current time (node is still down) or Up Event time (node is back up) 
 ,CASE 
  WHEN (SELECT TOP 1 ToLocal(EventTime) AS [EventTime]  
   FROM Orion.Events AS [EndTime] 
   WHERE EndTime.EventTime >= StartTime.EventTime 
   AND (EndTime.EventType = 504 -- up 
   OR EndTime.EventType = 510 -- warning 
   OR EndTime.EventType = 509 -- critical 
   OR EndTime.EventType = 512) -- resumed 
   AND EndTime.NetworkNode = StartTime.NetworkNode 
   AND EndTime.NetObjectType like 'a%' 
   AND EndTime.NetObjectID = StartTime.NetObjectID 
   AND EventTime IS NOT NULL 
   ORDER BY EndTime.EventTime) IS NULL THEN MINUTEDIFF(StartTime.EventTime, GETUTCDATE()) 
  ELSE MINUTEDIFF(StartTime.EventTime,  
   (SELECT TOP 1 EventTime  
   FROM Orion.Events AS [EndTime] 
   WHERE EndTime.EventTime > StartTime.EventTime 
   AND (EndTime.EventType = 504 -- up 
   OR EndTime.EventType = 510 -- warning 
   OR EndTime.EventType = 509 -- critical 
   OR EndTime.EventType = 512) -- resumed 
   AND EndTime.NetworkNode = StartTime.NetworkNode 
   AND EndTime.NetObjectType like 'a%' 
   AND EndTime.NetObjectID = StartTime.NetObjectID 
   ORDER BY EndTime.EventTime)) 
 END AS [Minutes] 
  
  
-- This is the table we are querying  
FROM Orion.Events StartTime 
join orion.apm.application a on a.applicationid=starttime.netobjectid 
 
-- EventType = 1 is to correlate with our 'Down Event' time stamp from line 13 above 
WHERE ((StartTime.EventType = 505 and StartTime.message not like '%unknown%')-- 505 is down and unknown, unknowns are fairly common 
or StartTime.EventType = 513 -- unreachable 
or StartTime.EventType = 511) -- suspended 
 
-- Here's the trick to get the query to dynamically pull details for the node you are looking at 
AND StartTime.NetObjectType like 'a%' 
and daydiff(starttime.eventtime,getdate())<31 
 
-- Order our results from the newest to oldest 'Down Event' 
ORDER BY [Down Event] DESC
) ev on a.applicationid = ev.applicationid

group by a.node.Caption, a.node.DetailsUrl, a.name, a.DetailsUrl
- Marc Netterfield, Github

View solution in original post

0 Kudos

Thank you very much for this.  I get There was an error processing the request. when adding this as a custom query.  does it work for you as is?  Thanks again!

0 Kudos
In the custom query widget you probably need to add an order by, I can never tell ahead of time which queries it is going to complain about on the web console.

just add this after the last line and see how it goes.
ORDER BY a.node.Caption
- Marc Netterfield, Github
0 Kudos