Alerting is the bane of my existence. We have all alerts set to email to our blackberries, so it is imperative to be able to suppress alerts once acknowledged or when they are very repetitive. That is, I get the alert when the condition occurs, start the workflow, but I don't need an alert telling me a device has a bad battery every two minutes all night long.
Currently, I am trying to manage our UPS alerts. We are using APC Smart-UPS with AP 9617/AP9618/AP9630 network management cards. I have nearly 350 of them scattered over 100 campus locations in multiple equipment closets at each site.
I need to alert on the following conditions:
Battery disconnected: PowerNet-MIB:apc.0.80
At least one bad battery: PowerNet-MIB:apc.0.17
UPS Failed a Self-test PowerNet-MIB:apc.0.3
Battery power is too low: PowerNet-MIB:apc.0.4
UPS on battery: PowerNet-MIB:apc.0.5
I am using a combination of advanced alerts using pollers, and trap alerts.
The problem is I cannot get the poller to work consistently for some alerts. It simply doesn't detect these conditions consistently across the WAN.
With my trap alerts, there's the obvious design flaw that it doesn't integrate with/have the same options as the other alerting engines, so I can't have the alarm pop into the "nodes with problems" section of my NPM website, but more than that I cannot get it to suppress successfully. I set my trigger theshold to at least 2 traps in 10 minutes, and to suspend further alerts for a period of time. However, I get alerts spamming my inbox anyway every couple of minutes.
Of course APC wants me to switch to their Infrastruxure UPS monitoring and management software. I want to keep a single network monitoring and management solution, but this is actually an issue my boss is pounding on me about, and our Solarwinds license is coming up for renewal. This issue could be a tipping point for us.
Anyone else with similar problems or advice, please comment.