Does anyone know how to create an alert notification for when an interface goes up or down? I'm only interested in getting a notification from the interfaces I'm monitoring. for example I have a switch with 200 ports, but I only want to get alerted on up down status of the two uplinks it has and not on all the user ports. I have about a hundred of these switches/routers.

  • The way I would do this is create a Custom Property under Interfaces with the Custom Property Editor on the SolarWinds Server.. Call it Uplink and use the True/False values to populate your uplinks.

    Then use the Advanced Alert Manager to create an alert using the Custom Property you created.

    Using the Interface as the type of property to monitor create the following alert.

    Trigger Alert when all of the following apply

         Uplink is equal to True

         Interface Status is equal to Down

    You can structure the alert how you need or add conditions if you need to filter out devices another way.

  • Thank you Rich. I might just unmanage the other interafces and not worry about the custom property, but that is a neat trick.

    Playing around with these alerts I discovered that an interface can have several different states other than up and down. Do you happen to know what they mean? Especially the 'Warning" and "Unknown"?

  • Warning is when the interface is going down and hasn't responded to ICMP for a set amount of time. It's now in a fast poll state where the polling engine is sending ICMP more rapidly to verify the down status.  If it does reply during the fast poll it goes back to an Up status.  If the interface never responds during the fast poll it's assumed down and moved to that status.  You can change the polling and fast poll rates on the polling engines under settings. Unknown is a status that Orion uses if it can't poll the interface for the initial poll after being set up.

    I also don't monitor any ports with the exception of my uplinks.  These are always in most cases user ports, and I don't need statistics from those ports.  Plus the added job weight to our pollers adds up very quickly with about 30stacks of 48port switches that are in a 4 to 5 stack totaling over 120 switches.

    If you like the statistics you can also set the non-uplink ports as "Unpluggable"  from the Edit Properties option of the Node Interface you want to set.  This is a great option to keep collecting stats, but it will ignore any alerts you built against interfaces since you have labeled that interface as an interface that can be offline, up, down, warning.. anything basically and it won't care since you listed the interface as having the ability to be unplugged.

  • I'm not sure if the warning state has anything to do with icmp. I have layer two interfaces in warning and unknown states and when I go into these interfaces details, poll stats are showing intermittent.

  • Well I didn't explain it the best since Interfaces aren't technically polled with ICMP at least I can't see how they would be since they aren't Layer 3 devices and don't adhere to any Layer 3 routing on the port.  My statement should have said nodes or Layer 3 devices with IP addresses for the example I gave..  I would imagine interfaces are polled with SNMP in some form or fashion, but the status should still mean the same thing.  Warning = Missing polls/dropping polls Unknown = Never has responded to a poll. 

    I'm not sure how a device would ever be unknown if it has polled since it knew the status at one point anything other than up would be down.

  • I guess you're right. It's the polling that is being missed. But I respectfully dont agree with your last statement.

    The Orion may encounter any number of issues trying to reach a specific device or poll that device, or interfaces for that matter, but does not necessarily mean the device or interface is "down". It happens to be that one of the switches I'm having these 'warning' and 'unknown' issues with is the very switch I'm connected to. If these uplink interfaces were really in a "down" state I'd be looking for another job by now. emoticons_happy.png

    I found this article online but I have no firewalls in between these switches and the NPM:

  • Right.. What I am saying though is that the Interfaces that are in warning have received and responded to a poll at some point, but are having issues keeping up responses all the time.  I haven't seen a node/interface stay in warning when they don't receive a poll after a polling period, and on the Unknown status I have only seen this in my environment when I have never been able to poll the device I added to NPM or SAM.

    Run Wireshark on the Orion server and sniff your polling port.  Filter by IP address of the node and see what you are sending and getting back.

    To me it almost sounds like your poller is getting overloaded and just can't keep up with the polling anymore.  In turn it's dropping polls and responses, or somewhere on your network between the switch and SolarWinds you have major congestion and the SNMP/ICMP polls are being de-prioritized and dropped at a router or a Layer3 switch.

  • Yup. I'll try and find out what's causing the missing data.

  • Remember the status identifiers are for if nodes/interfaces are responding or not, or if they are having issues responding..  A "Warning" or "Unknown" status isn't for anything else other than the polling status of an node/interface.  They don't report the health or hardware error status of a node/interface other than showing Up (they respond) or  Down (no response after x amount of time).  They are simply an indication of the Up/Down Status or an in between status due to Packet Loss, or inability to poll. 

    From what I was reading it seems you are or were thinking the Warning and Unknown statuses are reporting something to you other than the availability of the Node or Interface and its ability to respond to a ICMP or SNMP packet.

    When a node/interface is responding and up you can get some error and buffer dropping statistics from SNMP and they can show an issue on the Node/Interface or something coming from the Line connected to the node/interface causing an issue, but this will not take a node/interface out of Up (green) status while reporting these errors unless the node/interface it self stops responding to these request for information. 

    Hi.  What I do is have a series of key words in the interface description.  If it is for an agent, we ignore it - we don't "do" port level management for users.  However, in our description to any uplinked switches or other managed devices, such as servers, we add /* UPLINK TO XYZ @ IP ADDRESS / INTERFACE */ and create a custom trigger to send us alerts / tickets when ports go down.  We taylor this to the situation, but this way the WAN team and Systems team get real time alerts when things go bump in the datacenter.  emoticons_wink.png

    Then we create the alert which emails the team as well as cuts a ticket directly to our queue:

    You can of course do this for your circuits, as well - we use /* L3 MPLS link to xxxxxx */ or any other carrier and do the same key word search for the word MPLS.  Keeps us plenty busy!