This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Very high ping times on NPM server

FormerMember
FormerMember

In Orion NPM 9.5.1, I notice that many of our nodes are reporting latency times in the thousands of ms.

If I ping any of those devices from my own computer, it responds within a few ms. Yet if I connect to the NPM server and do the pings from there, it reports back anything from 1ms to 6000ms. It is very random and sporadic. This is only happening from the Orion server that we can tell.

If may just be a coincidence that it is the NPM server, or it could be something with the software causing it.

Has anyone seen behavior like this before. I saw one post about someone with Symantec AV having a similar issue, but I have Trend Micro installed, and I did try turning it off and testing it.

 

Thank you.

  • One thing to check is the status of your poller on your NPM box.  From web under admin look under details -> polling engines and make sure your poller isn't overburdened.  This is one thing that could cause a problem like this.  Also look at the overall health of the server and OS that NPM is running on in the first place.

  • FormerMember
    0 FormerMember in reply to ecklerwr1

    This is what I have under Polling Engines (see below).

    The OS is Windows 2003 Server Enterprise x64 running SP2. The hardware is a HP Proliant BL465c G1 with 8GB ram and four 2.8Ghz cores. It should have plenty of power. Watching task manager shows slight spikes in the CPU up to 10%, but most of the time it is barely breathing.

     

    Network Performance Monitor Polling Engines

    Last Database Update Now 

    Web Engine Running Since 5/3/2010 1:53:53 PM   * testing purpose only 

     

    Polling Engine on CLTAAPD3 

    Engine Status   Polling Engine Active 

    Type of Polling Engine Primary 

    Polling Engine Version Engine Version 9.5.1 - SolarWinds Orion Network Performance Monitor v9.5.1 

    IP Address 10.42.10.24 

    Last Restart 4/30/2010 1:54:47 PM 

    Last Database Sync 1 second ago 

    Last Fail-Over Never 

    Elements 2242 

  • The important part to look at looks like this on mine:

    Polling Engine on :}
    Engine Status     Status  Polling Engine Active
    Type of Polling Engine     Primary
    Polling Engine Version     Engine Version 2010.1.0 - SolarWinds Orion Core Services 2010.1
    IP Address     x.x.x.x
    Last Restart     4/23/2010 11:42:46 AM
    Last Database Sync     Now
    Last Fail-Over     Never
    Elements     246
    Network Node Elements     28
    Interface Elements     210
    Volume Elements     8
    Date Time     5/3/2010 11:09:27 AM
    Paused     False
    ICMP Status Polling Index     246 out of 246
    SNMP Status Polling Index     246 out of 246
    ICMP Status Polls per second     0
    SNMP Status Polls per second     0
    Max Status Polls Per Second     30
    DNS Outstanding     0
    ICMP Outstanding     0
    SNMP Outstanding     11
    ICMP Statistic Polling Index     473 out of 473
    SNMP Statistic Polling Index     473 out of 473
    ICMP Statistic Polls per second     0
    SNMP Statistic Polls per second     5.5
    Max Statistic Polls Per Second     30

  • FormerMember
    0 FormerMember in reply to ecklerwr1

    Oops, sorry, I meant to paste that part.

     

    Help Network Performance Monitor Polling Engines

    Last Database Update Now 

    Web Engine Running Since 5/3/2010 1:53:53 PM   * testing purpose only 

     

    Polling Engine on CLTAAPD3 

    Engine Status   Polling Engine Active 

    Type of Polling Engine Primary 

    Polling Engine Version Engine Version 9.5.1 - SolarWinds Orion Network Performance Monitor v9.5.1 

    IP Address 

    Last Restart 4/30/2010 1:54:47 PM 

    Last Database Sync 1 second ago 

    Last Fail-Over Never 

    Elements 2242 

     

    Network Node Elements 457 

    Interface Elements 952 

    Volume Elements 833 

    Date Time 5/3/2010 2:02:48 PM 

    Paused False 

    ICMP Status Polling Index 2242 out of 2242 

    SNMP Status Polling Index 2242 out of 2242 

    ICMP Status Polls per second 0.5 

    SNMP Status Polls per second 3 

    Max Status Polls Per Second 30 

    DNS Outstanding 0 

    ICMP Outstanding 0 

    SNMP Outstanding 24 

    ICMP Statistic Polling Index 3540 out of 3540 

    SNMP Statistic Polling Index 3540 out of 3540 

    ICMP Statistic Polls per second 0 

    SNMP Statistic Polls per second 9 

    Max Statistic Polls Per Second 30 

     

  • You poller looks pretty good considering how many elements you have on it... the next thing:

    Are you running your SQL server on the same or a separate box?  The next thing I would look at since your issue seems to be intermittent is running perfmon on the server NPM is running on and letting it run for while looking at things like the disk queue length to see if the hardware is straining a the point you see your ms response time jump through the roof.

  • FormerMember
    0 FormerMember in reply to ecklerwr1

    SQL is running on a separate server. 

    I guess when I say the problem is intermittent is that it isnt consistent with how it is pining.

    If I run "ping servername -t", the first 2 pings will come back at 5ms, then the next 10 will be at 5000ms, then the next 1 will be 5ms, then next 30 will be 3500ms, etc etc.

    I have updated the NIC drivers on the server thinking that might have something to do with it, and checked the network port on the switch for errors. It just doesnt make sense unless it is a flakey network card.

  • So I assume this is causing the pings from NPM to your nodes to show high ms response times for all your nodes response times graphs also then.  Also when you ping then NPM server from your client machine with an -t option do the ping times rise high sporadically also?... I would pretty much assume they would.  It definitely sounds like the issue is with the server NPM is running on and that you've narrowed it down to it.  One way you could rule out the NPM processes as part of the problem from the server hardware underneath it would be to stop all of the NPM processes and then ping -t the NPM server from your client and see if the ping times still rise from your client to NPM server even when the services aren't running... this would rule out the NPM services as part of the problem (ie. something like poller being overloaded, etc.)  I hope that makes sense.

  • When you run "ping -t" from the command line is there an actual five second pause when it reports 5000 or is at the same interval as the ones with smaller numbers?

  • FormerMember
    0 FormerMember

    I wanted to give an update on this. We were able to fix the problem and figured I would post our fix here in the event anyone ever has the same issue. It had to do with the IP Stack in Windows.

    We ran the following command to fix it:

    netsh int ip reset c:\resetlog.txt
  • Btrotter-

    Excellent so glad to hear you got the problem resolved and that it didn't require to much.  Apparently the command resets a couple registry keys and has the effect of reinstalling the protocol.

    The reset command is available in the IP context of the NetShell utility. Follow these steps to use the reset command to reset TCP/IP manually:

    1. To open a command prompt, click Start and then click Run. Copy and paste (or type) the following command in the Open box and then press ENTER:
      cmd
    2. At the command prompt, copy and paste (or type) the following command and then press ENTER:
      netsh int ip reset c:\resetlog.txt
      Note If you do not want to specify a directory path for the log file, use the following command:
      netsh int ip reset resetlog.txt
    3. Reboot the computer.

    When you run the reset command, it rewrites two registry keys that are used by TCP/IP. This has the same result as removing and reinstalling the protocol. The reset command rewrites the following two registry keys:

    SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\ 
    SYSTEM\CurrentControlSet\Services\DHCP\Parameters\

    To run the manual command successfully, you must specify a file name for the log, in which the actions that netsh takes will be recorded. When you run the manual command, TCP/IP is reset and the actions that were taken are recorded in the log file, known as resetlog.txt in this article.

    The first example, c:\resetlog.txt, creates a path where the log will reside. The second example, resetlog.txt, creates the log file in the current directory. In either case, if the specified log file already exists, the new log will be appended to the end of the existing file.