cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Alert History - Cleared Time

Jump to solution

Hello,

    I'm trying to find a way to get a hold of the cleared time for a historical alert. Basically, when an alert comes in when nobodies watching, and then clears itself, we would like to know how long that alert was active for. I thought we might be able to perform a custom SWQL query. The object Orion.AlertHistory has a 'timestamp' property, but nothing that might indicate the time it cleared.

   My next thought was to maybe create a custom property for the alert in question, and then use a reset action on the alert to change that properties value to the current time, effectively capturing the time it reset.

   Does that make sense? Am I on the right track?

   This is all pretty far out of my day-to-day, so I'm hoping one of you fine folks might be able to point me in the right direction. Any advice would be greatly appreciated.

 

   Thanks!

0 Kudos
1 Solution

Custom Alert SLA Report

mesverrum_0-1586457854408.png

 

 

--report on alerts triggered  
 
select ac.Name 
,ah.Message 
,'/Orion/NetPerfMon/ActiveAlertDetails.aspx?NetObject=AAT:'+ToString(AlertObjectID) as [_linkfor_Name] 
,case When RelatedNodeCaption!= EntityCaption THEN concat(isnull(RelatedNodeCaption,''),' - ',isnull(EntityCaption,''))
else EntityCaption
End AS [ALERT OBJECT] 
,EntityDetailsURL AS [_LinkFor_ALERT OBJECT]
--end as [Minutes Until Acknowledged] 
--,ack.Message as [Note] 
,tostring(tolocal(ah.TimeStamp)) as [Trigger Time] 
--,case when ack.timestamp is null then 'N/A' 
--else tostring(minutediff(ah.TimeStamp,ack.timestamp)) 
,tostring(isnull(tolocal(reset.TimeStamp),'Still Active')) as [Clear Time] 
,case when reset.timestamp is null then minutediff(ah.TimeStamp,GETUTCDATE())
else minutediff(ah.TimeStamp,reset.timestamp)
end as [Minutes Until Clear] 

 
FROM Orion.AlertHistory ah 
left join Orion.AlertObjects ao on ao.alertobjectid=ah.alertobjectid 
left join Orion.AlertConfigurations ac on ac.alertid=ao.alertid 
--left join Orion.Actions a on a.actionid=ah.actionid 
--left join (select timestamp, AlertActiveID, AlertObjectID,message from orion.alerthistory ah where eventtype=2) ack on ack.alertactiveid=ah.AlertActiveID and ack.alertobjectid=ah.AlertObjectID 
left join (select timestamp, AlertActiveID, AlertObjectID from orion.alerthistory ah where eventtype in (1,8)) reset on reset.alertactiveid=ah.AlertActiveID and reset.alertobjectid=ah.AlertObjectID 
 
WHERE  
ah.eventtype=0 
--and (ac.Name like '%${SEARCH_STRING}%' or EntityCaption like '%${SEARCH_STRING}%' or RelatedNodeCaption like '%${SEARCH_STRING}%') 
 
order by ah.timestamp desc

 

- Marc Netterfield, Github

View solution in original post

5 Replies
Level 16

Take a look at this thwack post this might point you in the right direction.

https://thwack.solarwinds.com/t5/NPM-Discussions/How-to-get-the-Alert-triggered-duration/td-p/73393

0 Kudos
In the alert history table an event type of 2 is an automated clear and an 8 is a manual cleared alert. I have a report that does exactly what you describe that I can dig up tomorrow
- Marc Netterfield, Github
0 Kudos

Custom Alert SLA Report

mesverrum_0-1586457854408.png

 

 

--report on alerts triggered  
 
select ac.Name 
,ah.Message 
,'/Orion/NetPerfMon/ActiveAlertDetails.aspx?NetObject=AAT:'+ToString(AlertObjectID) as [_linkfor_Name] 
,case When RelatedNodeCaption!= EntityCaption THEN concat(isnull(RelatedNodeCaption,''),' - ',isnull(EntityCaption,''))
else EntityCaption
End AS [ALERT OBJECT] 
,EntityDetailsURL AS [_LinkFor_ALERT OBJECT]
--end as [Minutes Until Acknowledged] 
--,ack.Message as [Note] 
,tostring(tolocal(ah.TimeStamp)) as [Trigger Time] 
--,case when ack.timestamp is null then 'N/A' 
--else tostring(minutediff(ah.TimeStamp,ack.timestamp)) 
,tostring(isnull(tolocal(reset.TimeStamp),'Still Active')) as [Clear Time] 
,case when reset.timestamp is null then minutediff(ah.TimeStamp,GETUTCDATE())
else minutediff(ah.TimeStamp,reset.timestamp)
end as [Minutes Until Clear] 

 
FROM Orion.AlertHistory ah 
left join Orion.AlertObjects ao on ao.alertobjectid=ah.alertobjectid 
left join Orion.AlertConfigurations ac on ac.alertid=ao.alertid 
--left join Orion.Actions a on a.actionid=ah.actionid 
--left join (select timestamp, AlertActiveID, AlertObjectID,message from orion.alerthistory ah where eventtype=2) ack on ack.alertactiveid=ah.AlertActiveID and ack.alertobjectid=ah.AlertObjectID 
left join (select timestamp, AlertActiveID, AlertObjectID from orion.alerthistory ah where eventtype in (1,8)) reset on reset.alertactiveid=ah.AlertActiveID and reset.alertobjectid=ah.AlertObjectID 
 
WHERE  
ah.eventtype=0 
--and (ac.Name like '%${SEARCH_STRING}%' or EntityCaption like '%${SEARCH_STRING}%' or RelatedNodeCaption like '%${SEARCH_STRING}%') 
 
order by ah.timestamp desc

 

- Marc Netterfield, Github

View solution in original post

Circled back around quite a ways, haha. But actually you were right, this was pretty spot on to what I needed. The only issue I'm having right now is I'm not seeing any alert that's currently active. I did some modifications to your query to catch only alerts that were active for more than 15 minutes without being acknowledged, and that seems to work well. If you see anything obvious in here please let me know:

 

select ac.Name
,ah.Message
,'/Orion/NetPerfMon/ActiveAlertDetails.aspx?NetObject=AAT:'+ToString(AlertObjectID) as [_linkfor_Name]
,EntityCaption as [Trigger Object]
,EntityDetailsUrl as [_linkfor_Trigger Object]
,case
WHEN RelatedNodeCaption=EntityCaption THEN 'Self'
When RelatedNodeCaption!=EntityCaption THEN RelatedNodeCaption
End as [Parent Node]
,RelatedNodeCaption
,RelatedNodeDetailsUrl as [_linkfor_Parent Node]
,'/Orion/images/StatusIcons/Small-' + p.StatusIcon AS [_IconFor_Parent Node]
,tostring(tolocal(ah.TimeStamp)) as [Trigger Time]
,case when ack.timestamp is null and minutediff(ah.TimeStamp,reset.timestamp)>15 then tostring(minutediff(ah.TimeStamp,reset.TimeStamp))
else tostring(minutediff(ah.TimeStamp,ack.timestamp))
end as [Minutes Until Acknowledged/Reset]
,ack.Message as [Note]
,case when reset.timestamp is null then 'N/A'
else tostring(minutediff(ah.TimeStamp,reset.timestamp))
end as [Minutes Until Reset]


FROM Orion.AlertHistory ah
left join Orion.AlertObjects ao on ao.alertobjectid=ah.alertobjectid
left join Orion.AlertConfigurations ac on ac.alertid=ao.alertid
left join Orion.Actions a on a.actionid=ah.actionid
left join Orion.Nodes p on p.nodeid=RelatedNodeID
left join (select timestamp, AlertActiveID, AlertObjectID,message from orion.alerthistory ah where eventtype=2 or eventtype=8) ack on ack.alertactiveid=ah.AlertActiveID and ack.alertobjectid=ah.AlertObjectID
left join (select timestamp, AlertActiveID, AlertObjectID from orion.alerthistory ah where eventtype=1) reset on reset.alertactiveid=ah.AlertActiveID and reset.alertobjectid=ah.AlertObjectID


WHERE
daydiff(ah.timestamp,GETUTCDATE())<30
and not minutediff(ah.timestamp,ack.timestamp)<15
and minutediff(ah.timestamp,reset.timestamp)>15

and ah.eventtype=0


--and (ac.Name like '%${SEARCH_STRING}%' or EntityCaption like '%${SEARCH_STRING}%' or RelatedNodeCaption like '%${SEARCH_STRING}%')
--and p.customproperties.customername like 'whatever'

order by ah.timestamp desc

0 Kudos

Hey, yeah, that's pretty close. I ended up finding a query in this forum, I think actually from you, that was even closer. I think our idea is to eventually get a report of alerts that are active for more than 15 minutes and that were not acknowledged. So I know there's plenty of reports out there that show acknowledged/unacknowledged. Just a matter of knitting that together with an outage duration, or I guess alert duration. This is actually what I'm working with now (again, not my own work):

select n.caption as [Device]
-- shows the current status icon
, '/Orion/images/StatusIcons/Small-' + n.StatusIcon AS [_IconFor_Device]
-- makes a clickable link to the node details
, n.DetailsUrl as [_linkfor_Device]
-- shows the timestamp of the down event, if there is no timestamp then is says the event was greater than the number of days in your event retention settings
, isnull(tostring(t2.[Down Event]),concat('Greater than ',(SELECT CurrentValue FROM Orion.Settings where settingid='SWNetPerfMon-Settings-Retain Events'),' days ago')) as [Down Event]
-- shows the timestamp of the up event, unless the object is still down
, isnull(tostring(t2.[Up Event]),'Still Down') as [Up Event]
-- figures out the minutes between the down and up events, if the object is still down it counts from the down event to now, displays 99999 if we cannot accurately determine the original downtime, and
, isnull(MINUTEDIFF(t2.[Down Event], isnull(t2.[Up Event],GETDATE())),99999) as Minutes


from orion.nodes n
left join (SELECT
-- Device nodeid used for our join
StartTime.Nodes.NodeID

-- Down Event time stamp in local time zone
,ToLocal(StartTime.EventTime) AS [Down Event]

-- Up Event time stamp in local time zone
,(SELECT TOP 1
ToLocal(EventTime) AS [EventTime]
FROM Orion.Events AS [EndTime]
-- picks the first up event that is newer than the down event for this node
WHERE EndTime.EventTime >= StartTime.EventTime
-- EventType 5001 is an alert reset
AND EndTime.EventType = 5001
AND EndTime.NetObjectID = StartTime.NetObjectID
AND EventTime IS NOT NULL
ORDER BY EndTime.EventTime
) AS [Up Event]

-- This is the table we are querying
FROM Orion.Events StartTime

-- EventType 5000 is an alert triggered
WHERE StartTime.EventType = 5000

) t2 on n.NodeID = t2.nodeid


-- this is how I catch nodes that are down but have aged out of the events table
where (n.status = 2 or t2.nodeid is not null)


-- If you want to filter the results to only show outages of a minimum duration uncomment the below line
--and MINUTEDIFF(isnull(t2.[Down Event],(GETUTCDATE()-30)), isnull(t2.[Up Event],GETUTCDATE())) > 15


-- if you want to use this query in a search box of the Custom Query resource uncomment the below line
--and n.Caption like '%${SEARCH_STRING}%'


order by t2.[down event] desc

 

 

 

The event types were originally 1 and I think 0 for node up and node down. I just swapped those for 5001 and 5000 for alert triggered/reset. It's a start anyhow, and I'll keep plugging away. This is my first foray into sql or swql, so learning as I go. I'll take any other advice you've got but otherwise, thanks for your help!

0 Kudos