Just a heads up the community. After troubleshooting for more time than I'd like to admit our network team discovered this recently posted bug in Cisco Catalyst 4000 series switches.
Basically, all you have to do is perform an SNMPwalk of the impacted OID tree and <wham!> packet loss for you for every device connected to or with traffic traversing that switch. In our experience we've seen packet loss (as measured by SolarWinds NPM) for up to about 30 seconds, but applications may be impacted longer depending on how well they recover.
Fortunately NPM does NOT, I repeat, does NOT query this OID or OID tree by default. You might have a UnDP poller out there that does, but this is not an out-of-the-box action. I can't comment on other vendors. Obviously we have a platform that does query that OID.
This query might help you find blocks of that packet loss. As we have multiple polling engines, we found it helpful query individual polling engines to make it easier to see the groups of data. This is just a quick and simple query. Someone can make it all pretty to count packet loss incidents where the count of incidents between the first and last event is >10 (or some other threshold). For our needs, this query works. And the logic made for a nice SAM component that we built an alert around.
Time is in the last hour. Salt to taste as adatole would say.
JOIN NodesData ON ResponseTime_Detail.NodeID = NodesData.NodeID
JOIN AllEngines ON NodesData.EngineID = AllEngines.EngineID
AND PercentLoss <> 0
AND NodesData.Unmanaged <> 1
AND NodesData.Status = 1
--AND AllEngines.EngineID = 10