cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 8

Cisco Devices Falsely Reporting As Down

Hi all,

     I've been having a persistent issue within Solarwinds. Our company has a main facility as well as a number of remote facilities. A few times each week (usually in the morning) Solarwinds will show that (mostly) all of the Cisco 3700 series switches are down at one of our non primary locations. I can still ping these devices just fine and towards the end of the day these false alerts will usually have sorted themselves out. I have tried tweaking our polling setting thinking it may just be network latency causing the issue, but that hasn't yielded any results as of yet. I have changed polling times for the nodes in question, as well as changing the "node down" alert to allow more time before reporting as such.

     I was wondering if anyone else has had similar issues and what steps were taken to resolve it.

     I appreciate any and all feedback and will gladly share any more information that may be useful.

     Thanks!

0 Kudos
17 Replies
Level 12

You might also try the poller tool that is within the install directory if you have access which might help you diagnose the problem, you can change the polling method in there and detect which if any its using,

mine was in C:\Program Files (x86)\SolarWinds\Orion

pastedImage_1.png

0 Kudos

Are you using ICMP or SNMP for Status Polling? ICMP is normally the default unless it’s restricted.

Edit: This is located in List Resources of the Node and is different from the polling method found in Edit Node.

- David Smith
0 Kudos

SNMP

0 Kudos

I would suggest changing that to ICMP if possible, that will then give you the ability to have an Up/Down status even if there are problems with the SNMP Connectivity or SNMP Agent.

- David Smith
0 Kudos
MVP
MVP

To validate what is or is not happening, you could get IPSLA jobs configured on your affected Cisco devices.  Even if just ICMP, you may gain some insight. Have them ping a SolarWinds poller. Another option is Netpath.

0 Kudos

Might someone have changed the access control list rules allowing your pollers to discover these switches?

Could one or more firewall rules have changed that would deny your pollers from accessing the switches via snmp?

Did one or more snmp strings change, either on the switches or on the poller(s)?  They have to stay in sync on both sides of the link for systems to be properly monitored and show "up" if you're monitoring via snmp instead of via icmp.

Nothing with the string has changed and I dont believe anyone has touched the access control list as of late (I'll ask around just to be safe). I was actually thinking that this may have something to do with our firewall so thats reassuring that you mentioned it. Some of our other devices that poll from smnp dont pull all of the info they should IE no system name, device type, ect. Just an IP (which is very annoying). These devices dont go down though. Only seems to be our 3700 series switches.

Thank you for your feedback!

0 Kudos

Ask your firewall administrator to check their firewall logs for traffic between your poller(s) and the nodes to see what's being denied (if anything).  Then request it be allowed.

Yep; we had that at a site because of an ASA; it treated the polling traffic as bad and blocked it.

0 Kudos
Level 13

How does the packet loss chart look?

Its most likely packet loss triggering the alerts. Look for the ugly red lines.

2018-12-14_12-43-03.png

Forgot to hit reply. Sorry! Take a look at what I posted if you'd like

0 Kudos

You're welcome for the help.

Your subject is Cisco, but you mention servers.

What is the polling method for these problem hosts? Agent? ICMP? SNMP? WMI?

0 Kudos

And yes, these are switches. Didn't mean to type servers there.

0 Kudos

SNMP

0 Kudos

Thank you! You were right. All the servers are showing 100% packet loss. I can still ping these devices, could this be some issue with our polling method (SNMP) not working correctly?

0 Kudos

100%? All of them?

That doesn't sound good.

Can you post a one week screenshot of an example node like I did above?

I suggest applying packet level threshold to your Node down alerts. For example we alert above 90% to cut down on noise.

Also, this obscure setting below is discussed a lot on this very same topic - what is your's set to? I believe 120 is the default.

2018-12-14_13-14-53.png

0 Kudos

SW09.JPG

This is what all (3) of the down nodes graphs look like. All exactly the same.

As for the "node warning level" it is set to 120 seconds. I've reached out to the net admin at that location to see if he can confirm or deny these servers state. Thanks for the info about packet loss. Still relatively new to SW and didn't know that screen existed!

0 Kudos