out of 73 nodes; we have one that seems to be affected.
up for 49 weeks
IOS Version: 03.07.01.E RELEASE SOFTWARE (fc3)
IOS Image: CAT3K_CAA-UNIVERSALK9-M
of the 72 unaffected nodes:
uptime from 1 week to 98 weeks
35 others have the same IOS Version
all 72 have the same IOS Image
Interesting! Not far off from what I'd expect I guess though. We're affected on about 4% of our nodes, which would be 2.92 nodes out of 73? So 1 isn't too far off I don't think. I think the IOS version is kind of "in the middle" of the versions we're using, so that would make sense. Looks like a stack of 2 switches? Out of our affected switches, only one of them is a stack.
Cisco is giving a bit of a runaround right now, hoping our SE will step in. Already asked for it to be moved to another engineer while ours was taking a day off, but they didn't listen to me. I gave him all the info he wanted in terms of "show" commands on 3 non-working switches and 1 working, plus all the stats I generated, including an SNMP walk from one of the non-working switches. But now he wants it from 2 other switches, 1 working 1 not, saying that getting it from these other switches will make it "easier to look up and reproduce". Why does he need an snmpwalk from a working device from me? Its working!
Sorry, a bit annoyed with them...
Going to open a case on yours?
Oh hey, we might have figured out a commonality that could help figure out why some switches are that way and not others. Only gone through a few of my switches so far, but, it seems to be limited to those switches that are either 100% full and have a dynamically assigned vlan (other than vlan 1) on each port, or have been statically configured on all the end-user ports to a vlan other than vlan 1. That, and all the trunk ports might need to have specific vlans allowed on the trunk ports too.
But, one other "symptom" we've figured out is that you can do the SNMP walk for one of the active vlans on the switch, but not vlan 1 also... ie: if all your switchports have "switchport access vlan 34" on them, you can do:
snmpwalk -v 2c -c MySNMPCommunity <IP of Switch>@34 184.108.40.206.220.127.116.11.4.1.2
snmpwalk -v 2c -c MySNMPCommunity <IP of Switch> 18.104.22.168.22.214.171.124.4.1.2
Note the "@34" to signify that vlan after the switch IP in the first example...
our network team has this on their queue; but not sure what traction it would get above projects and the like. We'll see I guess.
So, did some more testing with both an Arista switch and an older Cisco switch.
Long story short, its probably not an issue that affects multiple vendors. The Arista switch always returned info when that OID was scanned, no matter how the ports were configured.
However, on an old Cisco 2960-24TT-L, I observed the same behavior, both via a snmpwalk, and "list resources" missing the "VLAN" box in Orion. When all ports were configured in VLANS, issues were present. When at least one port wasn't in a vlan, issues were not present. This was a 12.2(X) strain of IOS also, so this issue has probably been around for quite a while also!!
So, its very possible that if you have cisco switches, that quite a bit of information might be missing from them!!
Interestingly I did a List Resources on one of them and it showed a cached version. I did a Force Refresh and then it showed the VLAN option. So one of the interfaces now had VLAN 1 on it. On the other 2, once I added an interface to VLAN1, I got the vlan option.
So, Solarwinds has apparently decided that this is not a bug. According to them "I've discussed your findings with other members of our team to discuss if there are any other options that we have to resolve your issue. Unfortunately what you're wanting is currently not a feature of the product. You can submit a Feature Request but there is no timeline on if or when it will be implemented."
This approach has definitely annoyed me, I find it to definitely be a bug in that they're not interpreting the results of their polling correctly...
What are your thoughts? Bug, or Feature Request?
Sounds like what you are describing is a legitimate bug in IOS. I'm not sure what the feature request would be in NPM, but maybe I'm missing something here.
So I would say this is less of a bug and more of a non standard snmp "feature" that Cisco has implemented. They have it documented various places but I am not aware of any other vendors who do it the way Cisco does.
I looked at the IETF standard for the bridge mib's and really cannot see anything there that indicates that they intended for that kind of capability to get different results from an OID via these "@" contexts.
So the list resources only tests with the community string it was given, and it looks like the cisco default behavior is that if no context is given on the community string it defaults to vlan1, which may or may not be in use.
Not sure of a clean solution to the issue.
If there there is another OID that just gives a list of all configured vlan id's SW could query that and then do a series of snmp scans where they append each context on, but that sounds like it could become realllly taxing in terms of polling load and on some polled devices with lots of VLAN's configured
So, from how Cisco explains it, that particular MIB (dot1dBasePortIfIndex) is a list of ifIndex's of ports in a given VLAN. Since the "default" is VLAN 1, if no ports are in VLAN 1, they return no ports. They say they interpret this the same way across all devices and IOS's, and from what I've seen they do.
There are other OID's from which you get the list of VLANs and their names and such. ie: if you query the OID below, you get a list that would be similar to what is shown.
VLAN Name: 126.96.36.199.188.8.131.52.184.108.40.206.1.4
‘220.127.116.11.18.104.22.168.22.214.171.124.126.96.36.199’ => “default”
‘188.8.131.52.184.108.40.206.220.127.116.11.18.104.22.168’ => “VLAN0010”
‘22.214.171.124.126.96.36.199.188.8.131.52.184.108.40.206’ => “VLAN0100”
‘220.127.116.11.18.104.22.168.22.214.171.124.126.96.36.199’ => “VLAN0101”
‘188.8.131.52.184.108.40.206.220.127.116.11.18.104.22.168’ => “vlan102”
‘22.214.171.124.126.96.36.199.188.8.131.52.184.108.40.206’ => “VLAN0109”
‘220.127.116.11.18.104.22.168.22.214.171.124.126.96.36.1990’ => “VLAN1000”
‘188.8.131.52.184.108.40.206.220.127.116.11.18.104.22.1681’ => “VLAN1001”
‘22.214.171.124.126.96.36.199.188.8.131.52.184.108.40.2062’ => “fddi-default”
‘220.127.116.11.18.104.22.168.22.214.171.124.126.96.36.1993’ => “token-ring-default”
‘188.8.131.52.184.108.40.206.220.127.116.11.18.104.22.1684’ => “fddinet-default”
‘22.214.171.124.126.96.36.199.188.8.131.52.184.108.40.2065’ => “trnet-default”
Taxing in terms of polling load? Solarwinds is supposed to do this for all devices, so not really sure why it would be more taxing than its regular behavior...
So to be sure to know about all the port mappings wouldn't they need to poll
220.127.116.11.18.104.22.168.22.214.171.124.1.4 via <mycommunity>@1
126.96.36.199.188.8.131.52.184.108.40.206.1.4 via <mycommunity>@10
220.127.116.11.18.104.22.168.22.214.171.124.1.4 via <mycommunity>@100
126.96.36.199.188.8.131.52.184.108.40.206.1.4 via <mycommunity>@101
220.127.116.11.18.104.22.168.22.214.171.124.1.4 via <mycommunity>@102
126.96.36.199.188.8.131.52.184.108.40.206.1.4 via <mycommunity>@109
220.127.116.11.18.104.22.168.22.214.171.124.1.4 via <mycommunity>@1000
If they don't poll all the used vlans then wouldn't they always run a risk of having the problem you started in the first place? Or do all interfaces show up even if that interface doesn't carry the vlan so we just need to poll the first existing vlan?