This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Weird Juniper Ping Problem

We have an issue that occurs about once a week.  It only involves Juniper EX3200/EX4200 switches and only some of them we have deployed/monitored.

About once a week, we will get an alert from Solarwinds that one of these devices is down (this ALWAYS occurs in the same order of devices making it even more strange).  When we check the device it's actually up, responding to pings etc.  BUT it will not respond to pings from the Solarwinds platform itself.

I opened several tickets with Solarwinds and they have told me that they just rely on the ping utility within Windows 2008 so if the device isn't pingable then it's got to be a Windows 2008 problem.  I have opened tickets with Juniper and we have proven that the ping request is arriving at the EX switch and that the response is going back out.  A Wireshark on the Windows server shows the ping going out but never coming back.

We have the latest Windows patches and the server is considered "clean" from a software perspective (and viruses of course).  There is nothing in between on the network that can cause this issue so we focused on the Windows server.

Bit the bullet as per Solarwinds suggestion and blew away the Solarwinds server that does the polling.  Rebuilt it, restored everything and same problem.  Server now also has the latest network card drivers.

Have totally run out of ideas - frustrating for sure and no idea what to do next.  Solarwinds said they are bringing out a feature where ping of the remote device isn't required (thank goodness, we were shocked when that was a requirement during our initial installation).  This new "feature" would at least alleviate this issue.

It's also worth noting that while the pings fail, SNMP continues to poll data successfully.  It's also worth noting that the alert will trigger every time on the same initial device and follow the same pattern through other Juniper EX switches at the same time interval.

Thoughts? ;)

Parents
  • Try changing the size of the data packet that gets sent.  You can do this under the NPM Polling Settings/ICMP Data.  I've seen odd issues before where the default text is either not big enough or is too large.  It makes absolutely no sense whatsoever, but it happens.  Mess around with smaller and larger sizes of text data to see if that is the issue.

    Also, does the behavior occur at a certain time of day consistently?

Reply
  • Try changing the size of the data packet that gets sent.  You can do this under the NPM Polling Settings/ICMP Data.  I've seen odd issues before where the default text is either not big enough or is too large.  It makes absolutely no sense whatsoever, but it happens.  Mess around with smaller and larger sizes of text data to see if that is the issue.

    Also, does the behavior occur at a certain time of day consistently?

Children
  • Various times.. happens roughly every 10 days ... but that varies too.

     

    Changed the size of the data packet and made no difference unfortunately.

     



    Try changing the size of the data packet that gets sent.  You can do this under the NPM Polling Settings/ICMP Data.  I've seen odd issues before where the default text is either not big enough or is too large.  It makes absolutely no sense whatsoever, but it happens.  Mess around with smaller and larger sizes of text data to see if that is the issue.

    Also, does the behavior occur at a certain time of day consistently?