Here is what happened last night. I have a network riding on a network. The lower level network had an outage at a particular site. The outage was reported by SolarWinds on the lower level network monitoring at 8:42PM. I have a higher level network riding the lower level network and if the lower level network goes down, the higher level network should go down at the same time. On the higher level SolarWinds I see traps from all of my other sites reporting they lost OSPF neighbor at the same time. If I look at my Percent Loss graph, it went to 100 percent packet lose at that time. Problem is I did not receive an indication of a triggered alert until 2:41AM this morning. The trigger alert was logged in SolarWinds at 2:41AM and the email alert received at 2:42AM so it was not an email problem. I have checked the alert and it is set to trigger after 0 seconds delay, check every 1 minute and is active 24hrs a day. I am running the same version of all SolarWinds components on both the lower and higher level networks (NPM 10.7). There are no errors indicating there were any problems with either SolarWinds or the SQL server that supports it. Anyone have any ideas?
You may can just check whether alert engine service was running at the time issue happened? whether it got started only around 2:40 AM ? If alert engine was started by that time, then the same thing can happen. It checked the alert condition and logged at 2:41 Am and you got an email alert on 2:42 AM.
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process.