cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Cisco 3850 issue which shows up in Orion (probably affects ALL cisco switches - UPDATE!)

So, I'm curious if anyone else is having an issue with Cisco 3850s, that we were kind of lucky to find, that affects Orion.

The way we noticed it was a few of our switches had the "VLAN" checkbox that is normally in "List Resources" missing.

ie: Working switch

pastedImage_0.png

Not working switch

pastedImage_1.png

The problem turns out to be that the OID (1.3.6.1.2.1.17.1.4.1.2  or dot1dBasePortIfIndex) cannot be scanned by Orion (or anything else!) on the switches that are not functioning right.   This is causing some issues with reports we want to generate and topology and other things.   Out of 861 switch stacks, I'm seeing it happen on 35 of ours.    The IOS versions I'm seeing affected are 03.06.06E, 16.3.7 and 16.3.8.   Switches that were just rebooted are affected, as well as those up 80+ weeks.    Can't find any commonality in terms of uptime, memory, cpu, stack size, IOS, etc...

I use this SWQL query to find them now, if the 3 columns after "Caption" are NULL, the switch seems to be affected.  

If you have a bunch of 3850's on your network, you can either run this in SWQL studio or add a "Custom Query" resource to any page and it would run right there.   Curious if anyone else is experiencing the issue that has a decent amount of 3850's, or if you do and aren't experiencing it, I'd be curious what IOS versions you're running.

Have a case open with Cisco, but they're at a loss I think.   If anyone else is experiencing the issue, it would probably be good for you to open a case also to get this moved to a higher priority on figuring out whats up and getting it fixed.

SELECT N.NodeID, N.IP_Address, N.Caption, TPE.InstanceID, TPE.Enabled, TPE.Node.Caption, N.CPUCount,  WeekDiff(N.LastBoot, GetDate()) AS Weeks, N.SwitchStack.MemberCount, N.MemoryAvailable, N.CPULoad, N.HardwareHealthInfos.Model ,N.IOSVersion, N.IOSImage

FROM  Orion.Nodes N

LEFT  JOIN Orion.TechnologyPollingAssignments TPE ON (TPE.InstanceID = N.NodeID) AND (TPE.TechnologyPollingID = 'Core.Topology.Vlan')

WHERE  (N.MachineType LIKE '%38xx%')

ORDER BY TPE.InstanceID, N.IP_Address

Post your responses here, or if you have any questions on how to run the query just let me know!

0 Kudos
13 Replies

So, from what I can tell at this point, this is due to Solarwinds taking some shortcuts to try and speed things up a bit.   There are two changes you can make in a config file that is in

volume:\Program Files (x86)\SolarWinds\Orion\Toplogy

The file is "SolarWinds.Topology.Polllers.dll.config".  The changes you need to make are below.   You will need to stop all the solarwinds services before doing this, and have admin privs to change this file.   It's advised to make a backup of the original file before doing these changes.

pastedImage_0.png

Once this is done, the "VLAN" box should show up on the devices if you go into "List Resources" and you might need to do a "Force Refresh".   You will need to make sure the VLAN box is selected AND hit submit once it is.

According to Solarwinds, "The settings only have an impact on the discovery process when doing a List Resources. It will take more time now as it's looking for the other VLANs to discovery the topology pollers. It doesn't have a impact on the discovery process when you run a Discovery as it already does a full inventory. It has no impact on polling".   Which I would wonder why it would be this way if it has such a minor impact on the product, but potentially causes issues.

I also confirmed that if you do these changes, running the "Configuration Wizard" will revert the changes back.  Not something I like to hear!!

So, no changes are required on the devices, just a couple minor changes to Orion.

0 Kudos

So, Solarwinds has apparently decided that this is not a bug.   According to them "I've discussed your findings with other members of our team to discuss if there are any other options that we have to resolve your issue. Unfortunately what you're wanting is currently not a feature of the product. You can submit a Feature Request but there is no timeline on if or when it will be implemented."

This approach has definitely annoyed me, I find it to definitely be a bug in that they're not interpreting the results of their polling correctly...

What are your thoughts?   Bug, or Feature Request?

Sounds like what you are describing is a legitimate bug in IOS. I'm not sure what the feature request would be in NPM, but maybe I'm missing something here.

0 Kudos

So I would say this is less of a bug and more of a non standard snmp "feature" that Cisco has implemented.  They have it documented various places but I am not aware of any other vendors who do it the way Cisco does.

SNMP Community String Indexing - Cisco

I looked at the IETF standard for the bridge mib's and really cannot see anything there that indicates that they intended for that kind of capability to get different results from an OID via these "@" contexts.

RFC 4188 - Definitions of Managed Objects for Bridges

So the list resources only tests with the community string it was given, and it looks like the cisco default behavior is that if no context is given on the community string it defaults to vlan1, which may or may not be in use. 

Not sure of a clean solution to the issue.

If there there is another OID that just gives a list of all configured vlan id's SW could query that and then do a series of snmp scans where they append each context on, but that sounds like it could become realllly taxing in terms of polling load and on some polled devices with lots of VLAN's configured

- Marc Netterfield, Github
0 Kudos

So, from how Cisco explains it, that particular MIB (dot1dBasePortIfIndex) is a list of ifIndex's of ports in a given VLAN.   Since the "default" is VLAN 1, if no ports are in VLAN 1, they return no ports.   They say they interpret this the same way across all devices and IOS's, and from what I've seen they do.

There are other OID's from which you get the list of VLANs and their names and such.   ie: if you query the OID below, you get a list that would be similar to what is shown.  

VLAN Name: 1.3.6.1.4.1.9.9.46.1.3.1.1.4

‘1.3.6.1.4.1.9.9.46.1.3.1.1.4.1.1’ => “default”

‘1.3.6.1.4.1.9.9.46.1.3.1.1.4.1.10’ => “VLAN0010”

‘1.3.6.1.4.1.9.9.46.1.3.1.1.4.1.100’ => “VLAN0100”

‘1.3.6.1.4.1.9.9.46.1.3.1.1.4.1.101’ => “VLAN0101”

‘1.3.6.1.4.1.9.9.46.1.3.1.1.4.1.102’ => “vlan102”

‘1.3.6.1.4.1.9.9.46.1.3.1.1.4.1.109’ => “VLAN0109”

‘1.3.6.1.4.1.9.9.46.1.3.1.1.4.1.1000’ => “VLAN1000”

‘1.3.6.1.4.1.9.9.46.1.3.1.1.4.1.1001’ => “VLAN1001”

‘1.3.6.1.4.1.9.9.46.1.3.1.1.4.1.1002’ => “fddi-default”

‘1.3.6.1.4.1.9.9.46.1.3.1.1.4.1.1003’ => “token-ring-default”

‘1.3.6.1.4.1.9.9.46.1.3.1.1.4.1.1004’ => “fddinet-default”

‘1.3.6.1.4.1.9.9.46.1.3.1.1.4.1.1005’ => “trnet-default”

Taxing in terms of polling load?   Solarwinds is supposed to do this for all devices, so not really sure why it would be more taxing than its regular behavior...

0 Kudos

So to be sure to know about all the port mappings wouldn't they need to poll

1.3.6.1.4.1.9.9.46.1.3.1.1.4 via <mycommunity>@1

1.3.6.1.4.1.9.9.46.1.3.1.1.4 via <mycommunity>@10

1.3.6.1.4.1.9.9.46.1.3.1.1.4 via <mycommunity>@100

1.3.6.1.4.1.9.9.46.1.3.1.1.4 via <mycommunity>@101

1.3.6.1.4.1.9.9.46.1.3.1.1.4 via <mycommunity>@102

1.3.6.1.4.1.9.9.46.1.3.1.1.4 via <mycommunity>@109

1.3.6.1.4.1.9.9.46.1.3.1.1.4 via <mycommunity>@1000

etc

If they don't poll all the used vlans then wouldn't they always run a risk of having the problem you started in the first place?  Or do all interfaces show up even if that interface doesn't carry the vlan so we just need to poll the first existing vlan?

- Marc Netterfield, Github
0 Kudos

We have 3 out of 6 showing this behaviour. They are the top 3 in this list.

pastedImage_0.png

0 Kudos

Interestingly I did a List Resources on one of them and it showed a cached version. I did a Force Refresh and then it showed the VLAN option. So one of the interfaces now had VLAN 1 on it. On the other 2, once I added an interface to VLAN1, I got the vlan option.

0 Kudos

So, did some more testing with both an Arista switch and an older Cisco switch.

Long story short, its probably not an issue that affects multiple vendors.  The Arista switch always returned info when that OID was scanned, no matter how the ports were configured.

However, on an old Cisco 2960-24TT-L, I observed the same behavior, both via a snmpwalk, and "list resources" missing the "VLAN" box in Orion.     When all ports were configured in VLANS, issues were present.  When at least one port wasn't in a vlan, issues were not present.    This was a 12.2(X) strain of IOS also, so this issue has probably been around for quite a while also!!

So, its very possible that if you have cisco switches, that quite a bit of information might be missing from them!!

0 Kudos
Level 15

out of 73 nodes; we have one that seems to be affected.

2 members

up for 49 weeks

WS-C3850-24S-E

IOS Version: 03.07.01.E RELEASE SOFTWARE (fc3)

IOS Image: CAT3K_CAA-UNIVERSALK9-M

of the 72 unaffected nodes:

uptime from 1 week to 98 weeks

35 others have the same IOS Version

all 72 have the same IOS Image

0 Kudos

Oh hey, we might have figured out a commonality that could help figure out why some switches are that way and not others.  Only gone through a few of my switches so far, but, it seems to be limited to those switches that are either 100% full and have a dynamically assigned vlan (other than vlan 1) on each port, or have been statically configured on all the end-user ports to a vlan other than vlan 1.   That, and all the trunk ports might need to have specific vlans allowed on the trunk ports too.

But, one other "symptom" we've figured out is that you can do the SNMP walk for one of the active vlans on the switch, but not vlan 1 also...   ie: if all your switchports have "switchport access vlan 34" on them, you can do:

snmpwalk -v 2c -c MySNMPCommunity <IP of Switch>@34 1.3.6.1.2.1.17.1.4.1.2

while

snmpwalk -v 2c -c MySNMPCommunity <IP of Switch> 1.3.6.1.2.1.17.1.4.1.2

Note the "@34" to signify that vlan after the switch IP in the first example...

interesting!

our network team has this on their queue; but not sure what traction it would get above projects and the like. We'll see I guess.

0 Kudos

Interesting!     Not far off from what I'd expect I guess though.  We're affected on about 4% of our nodes, which would be 2.92 nodes out of 73?   So 1 isn't too far off I don't think.   I think the IOS version is kind of "in the middle" of the versions we're using, so that would make sense.   Looks like a stack of 2 switches?  Out of our affected switches, only one of them is a stack.

Cisco is giving a bit of a runaround right now, hoping our SE will step in.   Already asked for it to be moved to another engineer while ours was taking a day off, but they didn't listen to me.   I gave him all the info he wanted in terms of "show" commands on 3 non-working switches and 1 working, plus all the stats I generated, including an SNMP walk from one of the non-working switches.   But now he wants it from 2 other switches, 1 working 1 not, saying that getting it from these other switches will make it "easier to look up and reproduce".    Why does he need an snmpwalk from a working device from me?  Its working!

Sorry, a bit annoyed with them...

Going to open a case on yours?

0 Kudos