8 Replies Latest reply on Nov 9, 2009 12:36 PM by byrona

    Cisco Switch Socket Overflow Problem


      When I use Orion to poll several of my Cisco switches I get a socket overflow, the message is below.  I found a reference to this problem in a different thread in the Thwack forums from a while back.

      Re: True PC spec required?

      What can I do to resolve this problem?

      %IP-6-UDP_SOCKOVFL:UDP socket overflow from Source IP: SNMP SERVER, Destination port: 161

        • Re: Cisco Switch Socket Overflow Problem

          This is a message from your switch indicating that it cannot handle the volume of incoming SNMP messages.  If the source of this is the Orion server, then you can do any of the following to help the situation:

          1. Change the polling intervals on your interfaces to poll less frequently.

          2. Poll less UnDPs if you have any.

          3. Manage less interfaces on the switch

          I realize you may not want to do #3, but wanted to list it as an option.

            • Re: Cisco Switch Socket Overflow Problem

              We tried changing the polling interval and scaled it way back, that didn't resolve the problem.  What we found is that the problem is happening in a single polling interval.

              We are not currently using UnDP's on the devices in question.

              Several of the devices only have a few interfaces so that shouldn't be the problem.

              We setup Wireshark and it looks like Orion is doing a single SNMP Get for each data-point it collects and in turn overwhelming the buffer on the switch.

              Most NMS systems that I have worked with have either used a snmp get/get-next, bulk-get, or snmpwalk.

              In the past we have used several other NMS systems including HP OpenView and OpenNMS to monitor the same objects on the same devices and we have never had this problem.

              I would be interested in better understanding how Orion actually does it's data-collection and what methods I have to tweak that.

              • Re: Cisco Switch Socket Overflow Problem

                Is this something that I should open a support ticket on?

                  • Re: Cisco Switch Socket Overflow Problem

                    Have you checked with TAC on a possible bug?  I monitor hundreds of Cisco devices with NPM  and have never seen this.

                      • Re: Cisco Switch Socket Overflow Problem

                        What kind of a polling interval are you using and do your switches (assuming there are some switches in there) have a lot of interfaces that you are collecting data on?

                        Just curious to get some more specific data as to what exactly you are doing?

                        What I was noticing is that changing the polling interval to much less frequent didn't seem to change this problem, it seems to occur within a single poll due to the sheer number of snmp-get's overwhelming the device.

                        Thanks in advance for any additional information that you can provide.

                    • Re: Cisco Switch Socket Overflow Problem

                      Ok, further investigation and testing has found that this is due to the number of interfaces on the device.

                      So, my question is how do you manage a device with a bunch of interfaces where the usage data on those interfaces is important?

                        • Re: Cisco Switch Socket Overflow Problem

                          Have you checked Cisco bug Db or checked with TAC, we have many customers monitoring tons of interfaces on large switches with no problems

                            • Re: Cisco Switch Socket Overflow Problem

                              I have not been able to find any info suggesting this is a problem other than references to the specific error message with suggestions on what to do when they occur.

                              The specific device that we have this problem on is a very old device that has been end-of-lifed by Cisco so it may just be a problem due to the vintage of hardware.  We eventually plan to replace this device so I at this point I am just going to leave it alone.

                              I will be sure to note if I start to see this problem on more devices.