SolarWinds NPM - Tutorial on how to use SNMP traps in alerts

Version 1

    Introduction

    A long journey ended when the proper syntax was found (Thank you, Thwack community) to correlate certain SNMP traps received with other alert values. Here is a short guide on how to use traps in alerts within the GUI of SolarWinds NPM.

     

    In this example, I am receiving a "dying gasp" in SNMP from an Alcatel-Lucent (Now Nokia) 7210SASD. When such an event happens, the equipment is basically telling me it lost power. This allows me to separate losing nodes from network failures or power failures. In other words, I only take action if the node is down due to the network. There isn't much I can do about power in those remote locations or customer premises.

     

    Using Node Custom Properties

    It all starts with a custom property on the nodes, which I called LossOfPower. (Boolean) See the attached picture for more details.

     

    SNMP Traps

    The traps have to be sent to SolarWinds. Here is the code for the 7210.

            snmp-trap-group 1

                description "SolarWinds 1"

                trap-target "solarwinds1" address <Solarwind NPM Server IP> snmpv2c notify-community "CatchyNameHere"

            exit

            snmp-trap-group 98

                description "OtherSNMPServers"

                trap-target "Server1" address <Server1 IP> snmpv2c notify-community "snmpv2cSAMtrap98"

                trap-target "Server2" address <Server2 IP> snmpv2c notify-community "snmpv2cSAMtrap98"

            exit

            snmp-dying-gasp primary 1 "solarwinds1" secondary 98 "Server1" tertiary 98 "Server2"

     

    The next step is to create the new alert which will set this property. This was written in SQL, not SWQL.

    Trigger

    SELECT Nodes.NodeID, Nodes.Caption FROM Nodes

    INNER JOIN Traps

    ON Nodes.NodeID = Traps.NodeID

    AND Traps.DateTime > DATEADD(MINUTE, -6, SYSDATETIME())

    AND Traps.TrapType = 'TIMETRA-SAS-SYSTEM-MIB:tmnxDyingGasp ';

     

    The two tables intersect using the INNER JOIN command, based ON the NodeID. There is a timer on this and only the DyingGasp received in the last 6 minutes is considered.

     

    Reset

    SELECT Nodes.NodeID, Nodes.Caption FROM Nodes

    INNER JOIN Traps

    ON Nodes.NodeID = Traps.NodeID

    AND Traps.DateTime < DATEADD(MINUTE, -9, SYSDATETIME())

    AND Traps.TrapType = 'TIMETRA-SAS-SYSTEM-MIB:tmnxDyingGasp '

    AND Nodes.Status = 1;

     

    If it has been more than 9 minutes and if the node is back online, this alert is reset.

     

    Trigger Action

    It simply sets the LossOfPower variable to "YES".

     

    Reset Action

    Set the LossOfPower variable to "No".

     

    Usage

    This is modular. The LossOfPower variable is used in another much simpler alert (it could be several other alert contexts) where we get contacted when a node is down. If the node is down due to LossOfPower, we do nothing. If it is otherwise down due to other causes, we take action.

     

    Testing and Researching

    To get all the properties from a table, SolarWinds NPM includes a query test page. Note the database names are slightly different. It is located at http://<yourserverIP>/Orion/Admin/swis.aspx

    If Orion.Traps is selected as a source, the Generate Select Query button returns this:

    SELECT Acknowledged, ColorCode, Community, DateTime, Description, DisplayName, EngineID, Hostname, InstanceType, IPAddress, NodeID, ObservationRowVersion, ObservationSeverity, ObservationSeverityName, ObservationTimestamp, Tag, TimeStamp, TrapID, TrapType, Uri FROM Orion.Traps

    This is useful in finding new fields you might need in your particular case.

     

    It is possible to remove certain fields from the SELECT and see what is returned. This won't work with traps though, as the table can get quite lengthy. This particular table is a log file of all traps. Try it on Orion.Nodes instead.

    SELECT AgentPort, Allow64BitCounters, AncestorDetailsUrls, AncestorDisplayNames, AvgResponseTime, BlockUntil, BufferBgMissThisHour, BufferBgMissToday, BufferHgMissThisHour, BufferHgMissToday, BufferLgMissThisHour, BufferLgMissToday, BufferMdMissThisHour, BufferMdMissToday, BufferNoMemThisHour, BufferNoMemToday, BufferSmMissThisHour, BufferSmMissToday, Caption, ChildStatus, CMTS, Community, Contact, CPULoad, CustomPollerLastStatisticsPoll, CustomPollerLastStatisticsPollSuccess, CustomStatus, Description, DetailsUrl, DisplayName, DNS, DynamicIP, EngineID, EntityType, External, GroupStatus, Icon, Image, InstanceType, IOSImage, IOSVersion, IP, IP_Address, IPAddress, IPAddressGUID, IPAddressType, IsServer, LastBoot, LastSync, LastSystemUpTimePollUtc, Location, MachineType, MaxResponseTime, MemoryAvailable, MemoryUsed, MinResponseTime, MinutesSinceLastSync, NextPoll, NextRediscovery, NodeDescription, NodeID, NodeName, ObjectSubType, OrionIdColumn, OrionIdPrefix, PercentLoss, PercentMemoryAvailable, PercentMemoryUsed, PollInterval, RediscoveryInterval, ResponseTime, RWCommunity, Severity, SkippedPollingCycles, SNMPVersion, StatCollection, Status, StatusDescription, StatusIcon, StatusIconHint, StatusLED, SysName, SysObjectID, SystemUpTime, TotalMemory, UiSeverity, UnManaged, UnManageFrom, UnManageUntil, Uri, Vendor, VendorIcon FROM Orion.Nodes

     

    Using the SWIS Query test page will be the subject of another entry.

     

    Regards,