In my previous tool tip (Syslog Charts (also alerts, traps, events) I described ways to chart information (alert, syslog, traps).
With some tweaking, we can add this information to the node details page to do analysis on who is generating and most alerts.
Two general resources are needed.
The first one points to issues with "node is down" alert. The second one shows me that one particular node triggered that excessively.
Picture | Code |
---|---|
SELECT ah.AlertObjects.AlertConfigurations.name, count(*) as [AlertCount] from Orion.AlertHistory ah where DAYDIFF(ah.TimeStamp,getdate()) = 0 and ah.AlertObjects.AlertConfigurations.name is not NULL group by ah.AlertObjects.AlertConfigurations.name order by [AlertCount] desc | |
SELECT ao.RelatedNodeCaption as [Node] ,ao.EntityDetailsUrl as [_LinkFor_Node] ,ao.AlertConfigurations.Name as [Alert Name] ,count (ao.AlertConfigurations.Name) as [count] FROM Orion.AlertObjects ao where DAYDIFF(ao.alerthistory.TimeStamp,GETDATE()) = 0 group by ao.relatednodecaption, ao.AlertConfigurations.Name, ao.EntityDetailsUrl order by [count] desc | |
Clicking on the node name brings us to the node details with the following two resources.
Picture | Code |
---|---|
select AlertHistory.AlertObjects.AlertConfigurations.Name as [Alert Name], Message, AlertHistory.AlertObjects.EntityCaption as [Triggering Object], ToLocal(Timestamp) as [Time], AlertHistory.AlertObjects.RelatedNodeCaption as [Related Node], 'https://insert server here/Orion/NetPerfMon/ActiveAlertDetails.aspx?NetObject=AAT:'+ToString(AlertObjectID) as [_linkfor_Message], 'https://insert server here/Orion/NetPerfMon/ActiveAlertDetails.aspx?NetObject=AAT:'+ToString(AlertObjectID) as [_linkfor_Alert Name] from Orion.AlertHistory where AlertHistory.AlertObjects.RelatedNodeID='${NodeID}' and daydiff(AlertHistory.TimeStamp,getdate()) = 0 and EventType = 0 order by TimeStamp desc | |
select convert(date,ah1.timestamp) [Date1] ,ah1.name ,count(name) [Count of ] ,'total' [total] from AlertHistoryView ah1 where DATEDIFF(day,ah1.timestamp,getdate()) < 30 and ah1.RelatedNodeId = ${nodeid} group by convert(date,ah1.timestamp), ah1.name |
It looks like the excessive triggers started on Sept 21 on node xyz. This gives me a good starting point on diagnosing this issue.
In looking at the "Average Response time & Packet loss", there seems to be excessive packet loss.
.
-Thanks
Amit