2 Replies Latest reply on Oct 20, 2008 9:36 PM by rcahill

    Unusual monitoring problem

      This has many of us stumped....maybe someone has an idea.  We have performed several testing measures with results that conflict. 


      We have several client networks that we monitor.  There is one Cisco switch at one site that constantly times out on ping responses from the Orion server (90% packet loss, the other 10% gets responses with very high latency - ~1000 ms and higher).  There are other switches at this site that respond fine from Orion.  There are other servers on the same LAN segment as the Orion server that get normal responses from all switches too. 


      Pings from the Orion server to this one switch often time out, which triggers a device down alert.  Running a 'debug ip icmp' on the switch shows that the switch receives the request and sends a reply, but from there it gets uncertain.  The response times vary widely on the SW server, often averaging over 900ms.  However, pings from other devices in the same subnet as the SW server have normal responses (no timeouts) and latency times. 


      When two -t pings are run simultaneously from SW, all of the replies stop timing out (100% response for both ping sessions, as opposed to 90% packet drops when one ping session is being run) but they still remain high in response time (~1000ms).  That would seem to implicate the SW server (running on a VMWare image), but this issue is only seen with this one switch within all of our devices monitored.  When switches at the same site as the affected switch are pinged from SW, the responses are normal. 


      When tracing from SW to the remote switch, response times are normal until the client network is traversed.  However, when logging into the client device at the edge of their/our network and pinging/tracing to the affected switch from there, response times are normal.  (One odd thing in the client network...on a traceroute the same IP is always hit twice for two consecutive hops).  The remote switch is not directly connected to the remote site's router...it is chained through another switch.  Pings between all switches on this site are normal. 


      Needless to say, there is much conflicting information that doesn't glaringly implicate any particular device.  


      I'll be happy to respond to any questions and appreciate any thoughts.


       Thanks.