What We're Working On
over 2 years ago

SNMP Traps and Syslog Traps can raise an advanced alert in NPM

At the moment received SNMP- and Syslog Traps are only parsed and written directly into NPM database

The alerting interface for Traps has nothing to do with the NPM advanced alerting engine - nothing else happens in NPM with that valuable collected information.

Additionally a received critical Trap has no correlation to the nodes status.

So until today Solarwinds NPM is a totally "polling-centric" solution.

Most other competitive monitoring products on market support more comprehensive alerting and node status views based on polling AND on traps.

What we want:

1. Any received Trap - Syslog or SNMP, based on customizable pattern/content criterias should be able to raise an alert in NPM advanced alert manager.

2. The trigger criteria can be regular expression to be able to grep any text content/field out of a Trap.

3. The reset criteria should do the same, to be able to clear raised alerts also by same or different Traps based on customizable pattern.

4. Admin should be able to choose for alert creation:

     a. Repeatedly recieved Traps with same pattern/content should be ignored by alerting engine after alert is triggered once - alert should still be active of course

     b. Repeatedly received Traps with same pattern/content should raise a counter in database to escalate or trigger alerts based on that counter value (e.g. > 10 identical messages should trigger alert)

5. These Trap Alerts (Syslog/SNMP) should change the node status LED in GUI

Why this makes sense:

Most vendors support more alerting features/messages via SNMP Traps than via SNMP polling (also Cisco!)

Alerting based on Traps is much faster than via polling

Alerting based on Traps is more efficient on the network as messages are only created on failure condition compared to polling

Lots of customers use competitive products supporting propper SNMP Trap handling and will not change to Solarwinds as long as this is not supported there.

There are already a lot of requests for Traps (SNMP, Syslog) raising an alert as found in thwack community:

http://thwack.solarwinds.com/message/195721

http://thwack.solarwinds.com/message/122581#122581

http://thwack.solarwinds.com/message/174510#174510

http://thwack.solarwinds.com/message/228483#228483

http://thwack.solarwinds.com/message/192825#192825

http://thwack.solarwinds.com/message/217626#217626

http://thwack.solarwinds.com/message/212761#212761

http://thwack.solarwinds.com/message/36239#36239

Parents
  • An alert is an alert whether it comes from polling or from a generated event such as a trap.

    It needs to be managed from the same place (advanced alert manager)  and viewable from the same place (alert browser).

    Regarding the original posting...a reset may not be feasible...remember a trap is an unsolicited event. 

    There may not be a "happy trap" when the issue is resolved.

    This is where some aspect of agent based monitoring comes into play...the heavy lifting occurs at the managed node and only the exception conditions are reported back up (event wise).

    Management by exception. Report the exceptions...the error conditions not the happy messages.  Too many happy messages creates a level of noise that can bury the problems.

    Polling for statistical data can either occur from the management server (polling engines) or fed back up from the agent...

Comment
  • An alert is an alert whether it comes from polling or from a generated event such as a trap.

    It needs to be managed from the same place (advanced alert manager)  and viewable from the same place (alert browser).

    Regarding the original posting...a reset may not be feasible...remember a trap is an unsolicited event. 

    There may not be a "happy trap" when the issue is resolved.

    This is where some aspect of agent based monitoring comes into play...the heavy lifting occurs at the managed node and only the exception conditions are reported back up (event wise).

    Management by exception. Report the exceptions...the error conditions not the happy messages.  Too many happy messages creates a level of noise that can bury the problems.

    Polling for statistical data can either occur from the management server (polling engines) or fed back up from the agent...

Children
No Data