This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

SolarWinds NPM - Tutorial on how to use SNMP traps in alerts

FormerMember
FormerMember

Introduction

A long journey ended when the proper syntax was found (Thank you, Thwack community) to correlate certain SNMP traps received with other alert values. Here is a short guide on how to use traps in alerts within the GUI of SolarWinds NPM.

In this example, I am receiving a "dying gasp" in SNMP from an Alcatel-Lucent (Now Nokia) 7210SASD. When such an event happens, the equipment is basically telling me it lost power. This allows me to separate losing nodes from network failures or power failures. In other words, I only take action if the node is down due to the network. There isn't much I can do about power in those remote locations or customer premises.

Using Node Custom Properties

It all starts with a custom property on the nodes, which I called LossOfPower. (Boolean) See the attached picture for more details.

SNMP Traps

The traps have to be sent to SolarWinds. Here is the code for the 7210.

        snmp-trap-group 1

            description "SolarWinds 1"

            trap-target "solarwinds1" address <Solarwind NPM Server IP> snmpv2c notify-community "CatchyNameHere"

        exit

        snmp-trap-group 98

            description "OtherSNMPServers"

            trap-target "Server1" address <Server1 IP> snmpv2c notify-community "snmpv2cSAMtrap98"

            trap-target "Server2" address <Server2 IP> snmpv2c notify-community "snmpv2cSAMtrap98"

        exit

        snmp-dying-gasp primary 1 "solarwinds1" secondary 98 "Server1" tertiary 98 "Server2"

The next step is to create the new alert which will set this property. This was written in SQL, not SWQL.

Trigger

SELECT Nodes.NodeID, Nodes.Caption FROM Nodes

INNER JOIN Traps

ON Nodes.NodeID = Traps.NodeID

AND Traps.DateTime > DATEADD(MINUTE, -6, SYSDATETIME())

AND Traps.TrapType = 'TIMETRA-SAS-SYSTEM-MIB:tmnxDyingGasp ';

The two tables intersect using the INNER JOIN command, based ON the NodeID. There is a timer on this and only the DyingGasp received in the last 6 minutes is considered.

Reset

SELECT Nodes.NodeID, Nodes.Caption FROM Nodes

INNER JOIN Traps

ON Nodes.NodeID = Traps.NodeID

AND Traps.DateTime < DATEADD(MINUTE, -9, SYSDATETIME())

AND Traps.TrapType = 'TIMETRA-SAS-SYSTEM-MIB:tmnxDyingGasp '

AND Nodes.Status = 1;

If it has been more than 9 minutes and if the node is back online, this alert is reset.

Trigger Action

It simply sets the LossOfPower variable to "YES".

Reset Action

Set the LossOfPower variable to "No".

Usage

This is modular. The LossOfPower variable is used in another much simpler alert (it could be several other alert contexts) where we get contacted when a node is down. If the node is down due to LossOfPower, we do nothing. If it is otherwise down due to other causes, we take action.

Testing and Researching

To get all the properties from a table, SolarWinds NPM includes a query test page. Note the database names are slightly different. It is located at http://<yourserverIP>/Orion/Admin/swis.aspx

If Orion.Traps is selected as a source, the Generate Select Query button returns this:

SELECT Acknowledged, ColorCode, Community, DateTime, Description, DisplayName, EngineID, Hostname, InstanceType, IPAddress, NodeID, ObservationRowVersion, ObservationSeverity, ObservationSeverityName, ObservationTimestamp, Tag, TimeStamp, TrapID, TrapType, Uri FROM Orion.Traps

This is useful in finding new fields you might need in your particular case.

It is possible to remove certain fields from the SELECT and see what is returned. This won't work with traps though, as the table can get quite lengthy. This particular table is a log file of all traps. Try it on Orion.Nodes instead.

SELECT AgentPort, Allow64BitCounters, AncestorDetailsUrls, AncestorDisplayNames, AvgResponseTime, BlockUntil, BufferBgMissThisHour, BufferBgMissToday, BufferHgMissThisHour, BufferHgMissToday, BufferLgMissThisHour, BufferLgMissToday, BufferMdMissThisHour, BufferMdMissToday, BufferNoMemThisHour, BufferNoMemToday, BufferSmMissThisHour, BufferSmMissToday, Caption, ChildStatus, CMTS, Community, Contact, CPULoad, CustomPollerLastStatisticsPoll, CustomPollerLastStatisticsPollSuccess, CustomStatus, Description, DetailsUrl, DisplayName, DNS, DynamicIP, EngineID, EntityType, External, GroupStatus, Icon, Image, InstanceType, IOSImage, IOSVersion, IP, IP_Address, IPAddress, IPAddressGUID, IPAddressType, IsServer, LastBoot, LastSync, LastSystemUpTimePollUtc, Location, MachineType, MaxResponseTime, MemoryAvailable, MemoryUsed, MinResponseTime, MinutesSinceLastSync, NextPoll, NextRediscovery, NodeDescription, NodeID, NodeName, ObjectSubType, OrionIdColumn, OrionIdPrefix, PercentLoss, PercentMemoryAvailable, PercentMemoryUsed, PollInterval, RediscoveryInterval, ResponseTime, RWCommunity, Severity, SkippedPollingCycles, SNMPVersion, StatCollection, Status, StatusDescription, StatusIcon, StatusIconHint, StatusLED, SysName, SysObjectID, SystemUpTime, TotalMemory, UiSeverity, UnManaged, UnManageFrom, UnManageUntil, Uri, Vendor, VendorIcon FROM Orion.Nodes

Using the SWIS Query test page will be the subject of another entry.

Regards,

Top Replies