In our networking environment, we use a third-party alerting system to alert our technicians of when there is a node with issues (down node/down interface etc) in conjunction with our Solarwinds platform (when after hours and a problem arises). This alerting system uses an email from our Solarwinds platform to create the alert but then also uses another email to "clear" the alert. For example if an interface bounced, Solarwinds detects the down interface, sends the alert email, but then polls again and sees that the interface is up, it sends another email that states the alert has "cleared." For our third party alerting system we had to develop a "correlator number" that will tie the clear message to the alert message this way (there is a wait time before the technician is given notice) our technicians are being bothered after hours with false alarms. If an interface flaps multiple times it will be a different alert instance per time, otherwise by the time our alerting system gets the alert and clear messages, they could be possibly out of order and the interface could still be in alert but instead could get the clear message from a different instance. Then the the technician wouldn't be given notice which could possibly a critical issue depending on what the interface is connected to. The "correlator number" fixes this problem (or at least for us it did). The number will stay the same per instance for both the "alert" email and the "clear" email tying the two together.
Here are the email alert and clear message settings that we use that allows us to use our third-party alerting system:
Alert Messages | Clear Messages
|
---|
Interface Down Alert: Subject: ${N=SwisEntity;M=Node.DisplayName} Alert: ${N=SwisEntity;M=Caption} is ${N=SwisEntity;M=StatusDescription} Message: Alert: Set Entity: ${N=SwisEntity;M=Node.DisplayName} Interface: ${N=SwisEntity;M=Name} Message: ${N=SwisEntity;M=Caption} is ${N=SwisEntity;M=StatusDescription}. Correlator: ${NodeName} ${N=SwisEntity;M=Name} ${SQL: SELECT AlertRefID FROM AlertConfigurations WHERE AlertID=${N=Alerting;M=AlertID}}${N=Alerting;M=AlertID}${N=SwisEntity;M=NodeID} | Interface Down Clear: Subject: ${N=SwisEntity;M=Node.DisplayName} Alert: ${N=SwisEntity;M=Caption} is ${N=SwisEntity;M=StatusDescription} Message: Alert: Clear Entity: ${N=SwisEntity;M=Node.DisplayName} Interface: ${N=SwisEntity;M=Name} Message: ${N=SwisEntity;M=Caption} is ${N=SwisEntity;M=StatusDescription}. Correlator: ${NodeName} ${N=SwisEntity;M=Name} ${SQL: SELECT AlertRefID FROM AlertConfigurations WHERE AlertID=${N=Alerting;M=AlertID}}${N=Alerting;M=AlertID}${N=SwisEntity;M=NodeID} |
When a Node goes down Alert: Subject: Alert: ${N=SwisEntity;M=Caption} is ${N=SwisEntity;M=StatusDescription} Message: Alert: Set Entity: ${NodeName} Message: ${NodeName} is ${N=SwisEntity;M=StatusDescription} Correlator: ${NodeName} ${SQL: SELECT AlertRefID FROM AlertConfigurations WHERE AlertID=${N=Alerting;M=AlertID}}${N=Alerting;M=AlertID}${N=SwisEntity;M=NodeID} | When a Node goes down Clear: Subject: Alert: ${NodeName} is ${N=SwisEntity;M=StatusDescription} Message: Alert: Clear Entity: ${NodeName} Message: ${N=SwisEntity;M=Caption} is ${N=SwisEntity;M=StatusDescription} Correlator: ${NodeName} ${SQL: SELECT AlertRefID FROM AlertConfigurations WHERE AlertID=${N=Alerting;M=AlertID}}${N=Alerting;M=AlertID}${N=SwisEntity;M=NodeID} |
When a Neighbor Goes Down Alert: Subject: Alert: The routing neighbor ${SQL: SELECT Caption FROM NodesData left join NodeIPAddresses on NodesData.NodeID=NodeIPAddresses.NodeID where (NodeIPAddresses.IPAddress='${N=SwisEntity;M=NeighborIP}' OR NodesData.IP_Address='${N=SwisEntity;M=NeighborIP}')} on ${NodeName} is down. Message: Alert: Set Entity: ${NodeName}'s routing neighbor ${N=SwisEntity;M=NeighborIP}. Message: ${NodeName}'s routing neighbor, ${SQL: SELECT Caption FROM NodesData left join NodeIPAddresses on NodesData.NodeID=NodeIPAddresses.NodeID where (NodeIPAddresses.IPAddress='${N=SwisEntity;M=NeighborIP}' OR NodesData.IP_Address='${N=SwisEntity;M=NeighborIP}')} (IP Address: ${N=SwisEntity;M=NeighborIP}), is down. Correlator: ${NodeName} ${SQL: SELECT AlertRefID FROM AlertConfigurations WHERE AlertID=${N=Alerting;M=AlertID}}${N=Alerting;M=AlertID}${N=SwisEntity;M=NeighborID} | When a Neighbor Goes Down Clear: Subject: Alert: The routing neighbor ${SQL: SELECT Caption FROM NodesData left join NodeIPAddresses on NodesData.NodeID=NodeIPAddresses.NodeID where (NodeIPAddresses.IPAddress='${N=SwisEntity;M=NeighborIP}' OR NodesData.IP_Address='${N=SwisEntity;M=NeighborIP}')} on ${NodeName} is down. Message: Alert: Clear Entity: ${NodeName}'s routing neighbor ${N=SwisEntity;M=NeighborIP}. Message: ${NodeName}'s routing neighbor, ${SQL: SELECT Caption FROM NodesData left join NodeIPAddresses on NodesData.NodeID=NodeIPAddresses.NodeID where (NodeIPAddresses.IPAddress='${N=SwisEntity;M=NeighborIP}' OR NodesData.IP_Address='${N=SwisEntity;M=NeighborIP}')} (IP Address: ${N=SwisEntity;M=NeighborIP}), is up. Correlator: ${NodeName} ${SQL: SELECT AlertRefID FROM AlertConfigurations WHERE AlertID=${N=Alerting;M=AlertID}}${N=Alerting;M=AlertID}${N=SwisEntity;M=NeighborID} |
Note: That on the "When a Neighbor Goes Down" alert, this is referring to a routing neighbor. For the message and the subject to not only show an IP address, there must be a node in Solarwinds (doesn't have to be anything more than a ping meaning SNMP is not required, so the neighbor could be the ISP router as an example). The SQL script came from another user, tnice81 that is used in the message and subject of "When a Neighbor Goes Down". The original post on where I found his script in Thwack: Including the neighbor node's name in the "A routing neighbor went down" alert.
We are currently using NPM version 11.5.2 with this configuration.