This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.

You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Very high ping times on NPM server

FormerMember over 13 years ago

In Orion NPM 9.5.1, I notice that many of our nodes are reporting latency times in the thousands of ms.

If I ping any of those devices from my own computer, it responds within a few ms. Yet if I connect to the NPM server and do the pings from there, it reports back anything from 1ms to 6000ms. It is very random and sporadic. This is only happening from the Orion server that we can tell.

If may just be a coincidence that it is the NPM server, or it could be something with the software causing it.

Has anyone seen behavior like this before. I saw one post about someone with Symantec AV having a similar issue, but I have Trend Micro installed, and I did try turning it off and testing it.

Thank you.

0 ecklerwr1 over 13 years ago

One thing to check is the status of your poller on your NPM box. From web under admin look under details -> polling engines and make sure your poller isn't overburdened. This is one thing that could cause a problem like this. Also look at the overall health of the server and OS that NPM is running on in the first place.
Cancel
Vote Up 0 Vote Down

Cancel
0 FormerMember over 13 years ago in reply to ecklerwr1

This is what I have under Polling Engines (see below).
The OS is Windows 2003 Server Enterprise x64 running SP2. The hardware is a HP Proliant BL465c G1 with 8GB ram and four 2.8Ghz cores. It should have plenty of power. Watching task manager shows slight spikes in the CPU up to 10%, but most of the time it is barely breathing.

Network Performance Monitor Polling Engines
Last Database Update Now
Web Engine Running Since 5/3/2010 1:53:53 PM * testing purpose only

Polling Engine on CLTAAPD3
Engine Status Polling Engine Active
Type of Polling Engine Primary
Polling Engine Version Engine Version 9.5.1 - SolarWinds Orion Network Performance Monitor v9.5.1
IP Address 10.42.10.24
Last Restart 4/30/2010 1:54:47 PM
Last Database Sync 1 second ago
Last Fail-Over Never
Elements 2242
Cancel
Vote Up 0 Vote Down

Cancel
0 ecklerwr1 over 13 years ago in reply to FormerMember

The important part to look at looks like this on mine:
Polling Engine on :}
Engine Status     Status Polling Engine Active
Type of Polling Engine     Primary
Polling Engine Version     Engine Version 2010.1.0 - SolarWinds Orion Core Services 2010.1
IP Address     x.x.x.x
Last Restart     4/23/2010 11:42:46 AM
Last Database Sync     Now
Last Fail-Over     Never
Elements     246
Network Node Elements     28
Interface Elements     210
Volume Elements     8
Date Time     5/3/2010 11:09:27 AM
Paused     False
ICMP Status Polling Index     246 out of 246
SNMP Status Polling Index     246 out of 246
ICMP Status Polls per second     0
SNMP Status Polls per second     0
Max Status Polls Per Second     30
DNS Outstanding     0
ICMP Outstanding     0
SNMP Outstanding     11
ICMP Statistic Polling Index     473 out of 473
SNMP Statistic Polling Index     473 out of 473
ICMP Statistic Polls per second     0
SNMP Statistic Polls per second     5.5
Max Statistic Polls Per Second     30
Cancel
Vote Up 0 Vote Down

Cancel
0 FormerMember over 13 years ago in reply to ecklerwr1

Oops, sorry, I meant to paste that part.

Help Network Performance Monitor Polling Engines
Last Database Update Now
Web Engine Running Since 5/3/2010 1:53:53 PM * testing purpose only

Polling Engine on CLTAAPD3
Engine Status Polling Engine Active
Type of Polling Engine Primary
Polling Engine Version Engine Version 9.5.1 - SolarWinds Orion Network Performance Monitor v9.5.1
IP Address
Last Restart 4/30/2010 1:54:47 PM
Last Database Sync 1 second ago
Last Fail-Over Never
Elements 2242

Network Node Elements 457
Interface Elements 952
Volume Elements 833
Date Time 5/3/2010 2:02:48 PM
Paused False
ICMP Status Polling Index 2242 out of 2242
SNMP Status Polling Index 2242 out of 2242
ICMP Status Polls per second 0.5
SNMP Status Polls per second 3
Max Status Polls Per Second 30
DNS Outstanding 0
ICMP Outstanding 0
SNMP Outstanding 24
ICMP Statistic Polling Index 3540 out of 3540
SNMP Statistic Polling Index 3540 out of 3540
ICMP Statistic Polls per second 0
SNMP Statistic Polls per second 9
Max Statistic Polls Per Second 30
Cancel
Vote Up 0 Vote Down

Cancel
0 ecklerwr1 over 13 years ago in reply to FormerMember

You poller looks pretty good considering how many elements you have on it... the next thing:
Are you running your SQL server on the same or a separate box? The next thing I would look at since your issue seems to be intermittent is running perfmon on the server NPM is running on and letting it run for while looking at things like the disk queue length to see if the hardware is straining a the point you see your ms response time jump through the roof.
Cancel
Vote Up 0 Vote Down

Cancel
0 FormerMember over 13 years ago in reply to ecklerwr1

SQL is running on a separate server.
I guess when I say the problem is intermittent is that it isnt consistent with how it is pining.
If I run "ping servername -t", the first 2 pings will come back at 5ms, then the next 10 will be at 5000ms, then the next 1 will be 5ms, then next 30 will be 3500ms, etc etc.
I have updated the NIC drivers on the server thinking that might have something to do with it, and checked the network port on the switch for errors. It just doesnt make sense unless it is a flakey network card.
Cancel
Vote Up 0 Vote Down

Cancel
0 ecklerwr1 over 13 years ago in reply to FormerMember

So I assume this is causing the pings from NPM to your nodes to show high ms response times for all your nodes response times graphs also then. Also when you ping then NPM server from your client machine with an -t option do the ping times rise high sporadically also?... I would pretty much assume they would. It definitely sounds like the issue is with the server NPM is running on and that you've narrowed it down to it. One way you could rule out the NPM processes as part of the problem from the server hardware underneath it would be to stop all of the NPM processes and then ping -t the NPM server from your client and see if the ping times still rise from your client to NPM server even when the services aren't running... this would rule out the NPM services as part of the problem (ie. something like poller being overloaded, etc.) I hope that makes sense.
Cancel
Vote Up 0 Vote Down

Cancel
0 John.Taylor over 13 years ago

When you run "ping -t" from the command line is there an actual five second pause when it reports 5000 or is at the same interval as the ones with smaller numbers?
Cancel
Vote Up 0 Vote Down

Cancel
0 FormerMember over 13 years ago

I wanted to give an update on this. We were able to fix the problem and figured I would post our fix here in the event anyone ever has the same issue. It had to do with the IP Stack in Windows.
We ran the following command to fix it:
netsh int ip reset c:\resetlog.txt
Cancel
Vote Up 0 Vote Down

Cancel
0 ecklerwr1 over 13 years ago in reply to FormerMember
Btrotter-
Excellent so glad to hear you got the problem resolved and that it didn't require to much. Apparently the command resets a couple registry keys and has the effect of reinstalling the protocol.
The reset command is available in the IP context of the NetShell utility. Follow these steps to use the reset command to reset TCP/IP manually:
To open a command prompt, click Start and then click Run. Copy and paste (or type) the following command in the Open box and then press ENTER:
cmd
At the command prompt, copy and paste (or type) the following command and then press ENTER:
netsh int ip reset c:\resetlog.txt
Note If you do not want to specify a directory path for the log file, use the following command:
netsh int ip reset resetlog.txt
Reboot the computer.
When you run the reset command, it rewrites two registry keys that are used by TCP/IP. This has the same result as removing and reinstalling the protocol. The reset command rewrites the following two registry keys:
SYSTEM\CurrentControlSet\Services\Tcpip\Parameters\
SYSTEM\CurrentControlSet\Services\DHCP\Parameters\

To run the manual command successfully, you must specify a file name for the log, in which the actions that netsh takes will be recorded. When you run the manual command, TCP/IP is reset and the actions that were taken are recorded in the log file, known as resetlog.txt in this article.

The first example, c:\resetlog.txt, creates a path where the log will reside. The second example, resetlog.txt, creates the log file in the current directory. In either case, if the specified log file already exists, the new log will be appended to the end of the existing file.
Cancel
Vote Up 0 Vote Down

Cancel