This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.

You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Cisco Switch Socket Overflow Problem

byrona over 14 years ago

When I use Orion to poll several of my Cisco switches I get a socket overflow, the message is below. I found a reference to this problem in a different thread in the Thwack forums from a while back.

What can I do to resolve this problem?

%IP-6-UDP_SOCKOVFL:UDP socket overflow from Source IP: SNMP SERVER, Destination port: 161

0 Karlo.Zatylny over 14 years ago

This is a message from your switch indicating that it cannot handle the volume of incoming SNMP messages. If the source of this is the Orion server, then you can do any of the following to help the situation:
1. Change the polling intervals on your interfaces to poll less frequently.
2. Poll less UnDPs if you have any.
3. Manage less interfaces on the switch
I realize you may not want to do #3, but wanted to list it as an option.
Cancel
Vote Up 0 Vote Down

Cancel
0 byrona over 14 years ago in reply to Karlo.Zatylny

We tried changing the polling interval and scaled it way back, that didn't resolve the problem. What we found is that the problem is happening in a single polling interval.
We are not currently using UnDP's on the devices in question.
Several of the devices only have a few interfaces so that shouldn't be the problem.
We setup Wireshark and it looks like Orion is doing a single SNMP Get for each data-point it collects and in turn overwhelming the buffer on the switch.
Most NMS systems that I have worked with have either used a snmp get/get-next, bulk-get, or snmpwalk.
In the past we have used several other NMS systems including HP OpenView and OpenNMS to monitor the same objects on the same devices and we have never had this problem.
I would be interested in better understanding how Orion actually does it's data-collection and what methods I have to tweak that.
Cancel
Vote Up 0 Vote Down

Cancel
0 byrona over 14 years ago in reply to Karlo.Zatylny

Is this something that I should open a support ticket on?
Cancel
Vote Up 0 Vote Down

Cancel
0 Donald_Francis over 14 years ago in reply to byrona

Have you checked with TAC on a possible bug? I monitor hundreds of Cisco devices with NPM and have never seen this.
Cancel
Vote Up 0 Vote Down

Cancel
0 byrona over 14 years ago in reply to Donald_Francis

What kind of a polling interval are you using and do your switches (assuming there are some switches in there) have a lot of interfaces that you are collecting data on?
Just curious to get some more specific data as to what exactly you are doing?
What I was noticing is that changing the polling interval to much less frequent didn't seem to change this problem, it seems to occur within a single poll due to the sheer number of snmp-get's overwhelming the device.
Thanks in advance for any additional information that you can provide.
Cancel
Vote Up 0 Vote Down

Cancel
0 byrona over 14 years ago in reply to Karlo.Zatylny

Ok, further investigation and testing has found that this is due to the number of interfaces on the device.
So, my question is how do you manage a device with a bunch of interfaces where the usage data on those interfaces is important?
Cancel
Vote Up 0 Vote Down

Cancel
0 bshopp over 14 years ago in reply to byrona

Have you checked Cisco bug Db or checked with TAC, we have many customers monitoring tons of interfaces on large switches with no problems
Cancel
Vote Up 0 Vote Down

Cancel
0 byrona over 14 years ago in reply to bshopp

I have not been able to find any info suggesting this is a problem other than references to the specific error message with suggestions on what to do when they occur.
The specific device that we have this problem on is a very old device that has been end-of-lifed by Cisco so it may just be a problem due to the vintage of hardware. We eventually plan to replace this device so I at this point I am just going to leave it alone.
I will be sure to note if I start to see this problem on more devices.
Cancel
Vote Up 0 Vote Down

Cancel