14 Replies Latest reply on Feb 13, 2018 12:09 PM by jeilers

    How can I tell WHY an Interface is in warning?

    pseudocyber

      I have one "problem" etherchannel - according to SolarWinds.  Supposedly, the L2 etherchannel port between two switch stacks "flaps" between warning and up.  However, if I go into the switches, the physical ports are running clean with no errors, the etherchannel port is clean with no errors, and LACP is clean with no errors.  Cisco side, says nothing is wrong.

       

      How can I actually drill into the SolarWinds and find out why it is saying the interface in Warning - did it get a trap?  Did it get a syslog?  Because there's NO indications of any problems (traps or syslog) except for the event.

       

      12/19/2017 8:28:32 AM Event

      Interface Port-channel1 · FUAS1_Gi[12]/1/4 for node FUAS2 is Warning.
        • Re: How can I tell WHY an Interface is in warning?
          alphabits

          Are you able to check the Events table in the database for any associated messages for that interface?

          • Re: How can I tell WHY an Interface is in warning?
            romeoguerrero

            I have pretty much the same problem. I get TONS of alerts saying interfaces are in warning status, but there is no data whatsoever, in the device or in Solarwinds, to explain why Solarwinds thinks the interfaces should be flagged as warning. I have opened 2 separate support cases 1341821 and 1369990 for this issue with no progress.

             

            The first tech I talked to tried to convince me that it was the Cisco device reporting the warning status. It took me some research, but I finally was able to prove that at least for the Cisco devices in question, there is no SNMP interface warning status. This means that Solarwinds is the one flagging the warning status.

             

            On the second support case, the tech had me create a test alert and we put in a bunch of macros hoping that one of the data points would give some indication of why Solarwinds is saying the interface is warning. Well, I just deleted over 26,000 email alerts and every single one that I looked at showed everything looks good, except that it shows the interface status as warning.

             

            I am now searching Thwack hoping the community might have some answers. I'll post back here if I find anything helpful. We are also on NPM 12.1 The problem seemed to start recently, maybe within the past 3 or 4 months, but we have been on 12.1 for quite some time.

            • Re: How can I tell WHY an Interface is in warning?
              jeilers

              This might not be correct but it is my best guess for an answer. If you go the interfaces page and go to "Edit Interface" then scroll down to the bottom you should see.


              Have you checked if any of those are true while the interface is in warning status?

                • Re: How can I tell WHY an Interface is in warning?
                  romeoguerrero

                  I know you may be directing this to either myself, the OP, or both. I have definitely checked those thresholds. In fact, that was the first place I pointed Solarwinds tech support to look at. At no time have any of those thresholds been breached. The Solarwinds data bears this out in every single instance.

                    • Re: How can I tell WHY an Interface is in warning?
                      jeilers

                      Yeah I am directing this to anyone who is worried about it lol. I'm looking into it more now and not finding concrete information on how that works exactly beyond what I have posted so far. Have you looked at what is being saved in the database for that interface around that time? Is it possible that there is a syslog/trap Alerts / Filters that could be changing the status?

                        • Re: How can I tell WHY an Interface is in warning?
                          romeoguerrero

                          Yes I actually even did raw SQL queries int the DB Manager trying to find any piece of data of why Solarwinds would say the interfaces are warning status. No luck there either.

                           

                          It wouldn't/couldn't be a Syslog or Trap. Correct? Those are not truly integrated into the SNMP polled side of things the way people seem to think. In other words, what I am saying is that there is no mechanism built into Solarwinds that would receive a certain trap or syslog and based on that message put an interface into warning or other status. Solarwinds is not nearly as integrated as people seem to think it is.

                           

                          As for alert filters, the alert that is triggering simply says "Alert if interface is in warning or critical status". So there is no threshold in the alert itself that is causing this.

                    • Re: How can I tell WHY an Interface is in warning?
                      rschroeder

                      I can suggest some plans that may help you discover the actual problem and remediate it:

                       

                      Plan 1:  Test using an alternate monitoring MIB

                      Many of us saw something similar when monitoring Cisco nodes with NPM--but specifically regarding GBIC status or temperature or fans.  The issue turned out to be caused by using a less desirable default monitoring MIB in NPM.  The issue is covered thoroughly here:

                       

                      Hardware Health Polling - Preferred Cisco MIB

                       

                      I'm not saying this is or is not associated with your reported issue, but it may be worth a quick review, and even testing performance by swapping between the CISCO-ENTITY-SENSOR-MIB and the CISCO-ENVMON-MIB just to see if there's a difference.

                       

                      Plan 2:  Capture the data and prove where it comes from

                      Perhaps the best evidence you can discover will come from putting Wireshark into the environment and capturing traffic between the switch(es) reporting port-channel problems and the NPM poller to which they send syslog and traps.  I'll take a chance and assume that poller is also the one that monitors them via snmp.

                       

                      You could either build a Cisco SPAN monitor session on the switch's uplink ports and mirror them to another port--to which you'd attach a PC running Wireshark to capture the data--or you could install Wireshark on your Solarwinds server that's involved in the monitoring process.  Simply capturing the data in real time and filtering it so Wireshark only displays info between the switch reported to have problems and the Solarwinds server it's reporting to, and that manages it via snmp.

                       

                      You should be able to identify all traps and syslog messages coming from the switch, and see all the snmp management traffic between the poller and the switch.  The data causing the Warning alerts should be in that stream.  Once you've got the data during a period when the port-channel reports a problem, you'll be able to see if the switch is sending that information to the server, or if the server is pulling that data from the switch via snmp.

                       

                      Plan 3:  Get TAC on the phone & troubleshoot it in real time with them

                      It sounds like you've already had a TAC session that did not capture the data.  You may have to go back to Cisco and escalate the case if your Wireshark capture proves it actually IS the switch sending the data.

                       

                      Plan 4:  Work with Solarwinds Support

                      Sometimes you need time and patience, and you may have to escalate the case, but if your Wireshark captures prove the switch is NOT sending that data, and if TAC proves (from real time testing and the output of a show-tec capture) that there is NO problem with the switch, then Solarwinds has to step up to the plate and knock that ball out of the park.

                       

                       

                       

                      Once you get this figured out, please post what you learned and what the cause & resolution are.  We'll be interested to read your discoveries.

                       

                      Swift packets!

                       

                      Rick Schroeder

                      • Re: How can I tell WHY an Interface is in warning?
                        romeoguerrero

                        Sorry it took so long to get back here, but I did manage to resolve this issue for my case, anyway. Something that I learned is that when Solarwinds SNMP queries for interface status/statistics, if it does not get a response, this is another circumstance where the interface will be marked as Warning.

                         

                        So my fix I still don't understand why this fixed it, but the issue went away. Basically, this problem was happening on Cisco Nexus 7ks. I found out that almost all of the interfaces that were checked for monitoring, had the "sub-objects" such as traffic statistics, error statistics, and interface status were unchecked. This made no sense to me and also I believe we want all that data. I went through all the interfaces and made sure those items were checked for monitoring. Since that change, I have not had the issue re-occur.