Community
- Command Central
- MVP Program
- Monthly Mission
- Blogs
- Groups
- Events
- Media Vault
Products
- Observability
- Network Management
- Application Management
- IT Security
- IT Service Management
- System Management
- Database Management
Content Exchange
- SolarWinds Platform
- Server & Application Monitor
- Database Performance Analyzer
- Server Configuration Monitor
- Network Performance Monitor
- Network Configuration Manager
- SQL Sentry
- Web Help Desk
Free Tools & Trials

Anyone else have problems with high response times?

First off, I apologize for the lengthy post.

tl;dr version: high response times, lots of troubleshooting, can't figure it out.

I've had a response time problem with all the nodes in my network for years now (since at least 6.x) and I'm confused as to why it's happening. Orion records response times much higher than what a CLI session shows, even over the course of 30 minutes to an hour. I've tried everything I can think of to get Orion to report somewhat normal response times.

Tuned the poller to just slightly higher than the recommended max. Done just slightly higher (by usually no more than 5 polls per second) to account for the addition of new nodes without having to re-tune every week. Configuring to exactly the recommended settings shows no discernable difference in results.
Attempted playing with NIC settings such as IP checksum offload, believing that maybe the processors in the server would be able to service checksum verification faster than the processor chip on the NIC. No change.
Configured my direct upstream firewall (which shows response times in Orion of 45ms!) to treat ICMP as high priority traffic. This actually had the complete opposite effect and caused response times to jump into the hundreds of milliseconds in Orion only but not in the CLI.
Set the ping data portion in Orion NPM Settings/Network to 0 bytes. Unfortunately, one of my nodes stopped responding when I did this, so I had to set it to a minimum of 18 bytes. No change whatsoever.
Taken Wireshark captures of the pings and see that they leave the Orion server and arrive mostly within 1 to 2 ms, which is what the CLI reports. Orion mysteriously records these times as in the tens of milliseconds. See edit below.
No ports are misconfigured as far as duplex / speed settings and none have errors.

I'm not entirely convinced that the problem is with Orion or the server itself, but I'm not ruling it out. It seems that everything I do to pin down the problem area leads to conflicting results, sending me back to square one. For instance, response times for a box which is on the same subnet as the Orion server are what I would expect (<1ms), so that would lead one to believe the problem is further upstream. However, after performing the QoS configuration on my firewall, Orion response time recording showed increased latency while CLI response times remained the same. I would expect that if I had misconfigured something, then my CLI response times would reflect an increase just like Orion.

This response time thing has some pretty important repercussions. We can't provide this data to our customers for SLA proof that we're "up to speed", so to speak. I can't configure alerts to go off on abnormally high response times for nodes, which on our regional network 15 to 20ms is considered abnormal.

Has anyone experienced these same issues and have they been able to resolve it? I'm willing to try just about anything to get these values to normal levels. My biggest question for Solarwinds is, does Orion use anything else to calculate response time besides ping times? For instance, SNMP response time, database write time, etc..

EDIT: Actually comparing the times in Wireshark for Orion pings and CLI pings show vastly different results. Orion pings do show that the responses come in between 10 and 75ms (I was reading the times wrong originally). Compared to CLI pings in Wireshark, CLI pings show proper response times in the single digits (<3ms). Furthermore, the CLI ping packets are larger in size than the Orion pings (74-bytes CLI compared to 62-bytes for Orion).

Find more posts tagged with

high_response_time

orion_9.1_sp5

Accepted answers

All comments

bleearg13

I hate doing this, but...bump? I'm looking for a possible answer from SW about the latency discrepancy on the network level between Orion and the CLI.

Dresden

As a suggestion: can you run a Wireshark capture on the Orion box. Let it set for a good 10 or 15 minutes, then take a look at the transfer times. See how they match up at this point, then let us know. If you still have issues, please open up a ticket with Support and we'll troubleshoot it further.

Thanks...

Philip

bleearg13

Dresden,

That show was great, I wish it was still on.

This is pretty much what I've already done, so I guess I'll open a case. What I'm seeing is that only the pings coming from Orion are experiencing this problem, but not from the CLI. I've seen various posts from others on this topic, but none really have a satisfactory resolution.

vhcato

We too are seeing the same thing on a new instance of 9.1 SP5. We've never seen this in previous releases, but it's quite obvious now.

bleearg13

I have had a ticket open on it for three days and have yet to receive much of a response. I've seen this problem since at least 6.x, but always chalked it up to poor server hardware.

Harsem

Hello bleearg13,

thank you for your post - these are the exact same symptoms we are experiencing in our environment. I thoguht it may have got something to do with us running Solarwinds on ESX Hosts - but then whenever we did pings from the Soalrwinds VM the times were sub 1ms.

Hence I am glad that it is Solarwinds that seems to be wrong rather then having to doubt our LAN.

Our delays that we are seeing range from 0ms to 6ms during all times of the day (including overnight when there is very little activity).

Is it worth us opening another ticket for this when it is a known issue?

Jens

jayd

I'm glad I saw this thread. I was preparing to go into the data center at 4:30am this morning to do a cold shutdown of my switches. Orion has reported high response times on just about every node being monitored since upgrading to 9.5. This includes switch ports, servers, firewalls and routers. There was never an issue with this prior to the upgrade.

I performed the upgrade from 9.1 SP5 to 9.5 last week on our production system. It went smoothly and without issue. Everything else seems to be working fine. Our Orion server is a VM configured with dual Xeon processors and 2GB RAM using a dedicated NIC on the host. It has 32bit Windows 2003 Standard for the OS. SQL 2005 is 64bit running on it's own dedicated physical server.

If I reboot the Orion server, the high response time alerts seem to diminish to just a few for a period (maybe 2 to 3 hours) and then will begin reappearing with regularity. The only thing I haven't done since the upgrade is reboot the SQL server. I'm going to do that this morning.

Jay

bleearg13

I cannot say with any certainty that either of these two latest reported issues are the same as mine. In my case, the response time is in the tens or hundreds of milliseconds. In addition, it has been happening for years, and wasn't something that just occurred after an upgrade to 9.5. In addition, my high response times are constant and do not ever seem to subside.

If I were you, jayd and Harsem, I would perform the same tests I did, to try and isolate the cause:

Pick a node to monitor via ICMP.
Start a Wireshark capture on the Orion server for that node, filtering only for ICMP (ex. protocol icmp and ip host 192.168.5.8).
Begin a constant ping from the Orion server to that node.
Let it run for about 30 minutes to an hour.

In the Wireshark capture, you can typically differentiate the two separate types of pings by both the size of the packet and the data contained within it. If you set your Time Display Format to "Seconds since previous captured packet", you can get an idea as you scroll through the pings as to how much of a difference there is between the Orion pings and the CLI pings. Mine was pretty obvious.

bshopp

We are looking into this to see what is going on. If you don't already have a support case open, please do so and include diagnostics

neocatalyst

Jayd

Did you ever get this resolved? I am having the exect same problem and my setup is nearly identical to yours. My NPM and SQL are running on two separate VM's.

Much Thanks

Pat

bleearg13

Jayd
Did you ever get this resolved? I am having the exect same problem and my setup is nearly identical to yours. My NPM and SQL are running on two separate VM's.

Much Thanks
Pat

I know there is an open ticket on the issue, but no resolution as of now.

RussKnapp

I'm wondering, too, if you got an answer to this? We just noticed that our latencies, as shown in the response times table on the 'Top 10' page, are 60-80% higher than they had been a month or so ago. If I ping the device from my workstation, through our IPSEC tunnel, I see latencies in-line with our experience, and much lower than shown in the Response Times table. It is as if a recent upgrade caused OrionNPM to perform differently re: response time.

Any light anyone can shed on this is appreciated. Russ.