So we have some simple needs. We implemented two alerts as follows - both with email actions for triggers (Node Down, Node Rebooted) and reset (Node Down):
Node Down - We want to know (email) if a node goes down - after it has been down for 5 seconds. We then want to know that the node has come back up - after it has been up for 15 seconds.
Alert Rule - Runs Every 1 Minute
Node Rebooted - We want to know when a node reboots.
Alert Rule - Runs Every 1 Minute
Here are the alert logs for a test node:
Node Down 5/5/2010 9:12:49 PM TESTNODE has stopped responding (Request Timed Out)
Node Rebooted 5/5/2010 9:13:46 PM TESTNODE rebooted at 5/5/2010 21:11
Node Up 5/5/2010 9:13:46 PM TESTNODE is responding again. Response time is 1 milliseconds
Alert Triggered 5/5/2010 9:13:57 PM Node TESTNODE has rebooted at Wednesday, May 05, 2010 9:12 PM.
Alert Triggered 5/5/2010 9:14:57 PM Node TESTNODE has rebooted at Wednesday, May 05, 2010 9:12 PM.
We recieved two TESTNODE Up alert emails - one at 9:13 and one at 9:14. At 10:04 we recieved a TESTNODE rebooted email that stated it rebooted at 9:12. Why did this happen? Other times we get UP alerts for a node that rebooted and nothing else - no down and no reboot alerts. How to prevent this (no down alerts without an up alert - do we then miss valid alerts)?
Second Question - How do we create a alert that can tell the difference between a node that was deliberately rebooted and one that was hard rebooted? With windows nodes, there are event log enteries to tell what happened (not sure even then how we would incorporate that into alert logic). Has anyone build an alert that can tell the difference and show an alert? How would one do this with cisco switches?
Any help is appreciated!
John
NPM 10 RC
NTA 3.6
APM 3.5 RC
NCM 5.5
IPAM 1.6