cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

SNMP Traps and Syslog Traps can raise an advanced alert in NPM

SNMP Traps and Syslog Traps can raise an advanced alert in NPM

At the moment received SNMP- and Syslog Traps are only parsed and written directly into NPM database

The alerting interface for Traps has nothing to do with the NPM advanced alerting engine - nothing else happens in NPM with that valuable collected information.

Additionally a received critical Trap has no correlation to the nodes status.

So until today Solarwinds NPM is a totally "polling-centric" solution.

Most other competitive monitoring products on market support more comprehensive alerting and node status views based on polling AND on traps.

What we want:

1. Any received Trap - Syslog or SNMP, based on customizable pattern/content criterias should be able to raise an alert in NPM advanced alert manager.

2. The trigger criteria can be regular expression to be able to grep any text content/field out of a Trap.

3. The reset criteria should do the same, to be able to clear raised alerts also by same or different Traps based on customizable pattern.

4. Admin should be able to choose for alert creation:

     a. Repeatedly recieved Traps with same pattern/content should be ignored by alerting engine after alert is triggered once - alert should still be active of course

     b. Repeatedly received Traps with same pattern/content should raise a counter in database to escalate or trigger alerts based on that counter value (e.g. > 10 identical messages should trigger alert)

5. These Trap Alerts (Syslog/SNMP) should change the node status LED in GUI

Why this makes sense:

Most vendors support more alerting features/messages via SNMP Traps than via SNMP polling (also Cisco!)

Alerting based on Traps is much faster than via polling

Alerting based on Traps is more efficient on the network as messages are only created on failure condition compared to polling

Lots of customers use competitive products supporting propper SNMP Trap handling and will not change to Solarwinds as long as this is not supported there.

There are already a lot of requests for Traps (SNMP, Syslog) raising an alert as found in thwack community:

http://thwack.solarwinds.com/message/195721

http://thwack.solarwinds.com/message/122581#122581

http://thwack.solarwinds.com/message/174510#174510

http://thwack.solarwinds.com/message/228483#228483

http://thwack.solarwinds.com/message/192825#192825

http://thwack.solarwinds.com/message/217626#217626

http://thwack.solarwinds.com/message/212761#212761

http://thwack.solarwinds.com/message/36239#36239

113 Comments

the alert manager doesn't take a realtime feed of data, so this is somewhat of a more fundamental change in the way it works,

reaches deeper into the rest of the product potentially making it much harder to use for existing users and breaking existing functionality

(1,2,3) really needs to be adding additional rules for generating alerts in the alert manager.

e.g. rules allowing time-bounded checking of the contact of the traps and syslog tables for certain data

You can do this today by using custom SQL alerts on nodes

[I think this is a good idea]

(4) is there in the trap viewer (set a tag on the incoming traps)

(5) would get over-written by the regular status polling, and so reaches deeper into the product that just generating an alert.

Being able to set the status to warning, or set a property might be useful alert actions

[I think this is a idea that needs more thought]

One thing that would help, would be to have all of the alert actions available in trap viewer that is in alert manager.

Level 11

RichardLetts is right for sure about breaking current Orion Alerting implementation.

It is just the input from dozen customers I installed Solarwinds, there are always some devices relying on trap alerting and/or also syslog messages.

I described the idea as Alert Manager Feature, but I go further and suggest there should also be affect on the node status for Trap Alerts..

If there is no relation to an Orion Node Status, it may break with reality where a node e.g. send Spanning Tree Traps causing unstable network with everything green in Orion.

Another example: Who polls HSRP Status on Routers via SNMP, there are perfect syslog traps and if a Standby Router Switches to active

There is a problem which should not only be Emailed into an Inbox without a change on the Orion GUI in colors... (e.g. NOC View)

Level 9

Level 11

having to do policy administration in multiple places for alerts and notification is not supportable in a large environment, and it's not intuitive to the monitoring team that must support policies on multiple instances of Orion.

The fact that trap alerts are not viewable in the alert browser also makes them difficult to centralize into a consolidated command center-type view; which is what my team has been asked to provide.

As clunky as the NNM alert browser is (even the new one); it has had this functionality for 15 years.

Level 9

I could definitely use this. We are migrating from CA Unicenter NSM which primarily uses SNMP traps and evaluate/parse and create alerts. I like to see similar function in Solarwinds.

Level 16

The way traps is working is NPM achilles heel

I will love to copy all my traps to alerts

An alert is an alert whether it comes from polling or from a generated event such as a trap.

It needs to be managed from the same place (advanced alert manager)  and viewable from the same place (alert browser).

Regarding the original posting...a reset may not be feasible...remember a trap is an unsolicited event. 

There may not be a "happy trap" when the issue is resolved.

This is where some aspect of agent based monitoring comes into play...the heavy lifting occurs at the managed node and only the exception conditions are reported back up (event wise).

Management by exception. Report the exceptions...the error conditions not the happy messages.  Too many happy messages creates a level of noise that can bury the problems.

Polling for statistical data can either occur from the management server (polling engines) or fed back up from the agent...

Level 9

This has been a problem for us since Day 1.

However, it is now a big deal, as one of our critical monitoring systems is only present as Trap---- "SNMP Dying Gasp". We spent $Millions for new hardware, and this was a required feature. We have to be able to separate power problems, over which we have no control (and no responsiblity) from all other network issues that we need to address. Calling the customer at 2AM is not an acceptable solution. Since power is 50+% of problems, maybe even 70% --- getting this right matters a lot.

But NPM cannot use it in a sensible manner --- so we get an email saying "Dying Gasp recieved"; followed by either a "Node rebooted" email if the power event was short; or a "Node Down" email if it was over 5 minutes.

I can follow the logic....but I have 30+ years of professional experience; am a programmer by nature; and set these things up. Our Help Desk and Operators have a much more difficult time understanding what to do---and NPM *should* be able to fix that --- but this missing feature has been in the way for years now.

Please find some way to unify this into one set of alerts that one person can look at at one time and know if there is a problem or not!

And I totally agree with the poster that maintaining Alert logic in more than one place is a poor practice.

Level 7

Hear, hear!

We have the same problem in my organization.  What we've resorted to for both traps and alerts is this:

The Trap or Syslog Viewer run an external program upon receipt of a message containing our "trigger" string (so part of the message must contain "sw_trigger").  The external program is a Perl script that parses the trap or syslog message, and edits the identifier we have specified in that message into a custom field in the Nodes table of the NPM database (or removes it in case of a clear event).  We have Advanced Alert rules that look for these identifiers and raise or clear alerts based on the presence or absence of these identifiers in our custom field in the Nodes table.  It's a terrible hack with many moving parts in SolarWinds, and we do occasionally miss clear events which causes a tremendous amount of extra work for our staff.

If NPM were able to parse the contents of a trap or syslog message directly into an Alert, this would save the round-trip through our Rube Goldberg machine that runs most alerting in our organization, but which few in our organization truly understand.  Thanks in advance for any help with this; we've been asking for it for years!