cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

My Orion Website & Server is very slow / CPU & Memory Spike /Polling gaps what should i check quickly ?

This article provides quick information about  your current environment and health check , Further it  will help you address the most common reasons of  performance issues on to your server without sending the diagnostics to SolarWinds support  .

In this article you can also Audit your own environment quickly if its been setup as recommended by the Solarwinds MINIMUM requirements or according to the settings eliminating bottlenecks creating performance issues within the set environment .

This article also help to save time to upload the diagnostics for Support where you have air gap between the server and you can check the basic health check on the actual server itself .

There are lots of other benefits to check the environment health within the internal due to security procedure not allowing uploading the diagnostics for Solarwinds support .

AAEAAQAAAAAAAAS2AAAAJGU1ZjVjN2NkLTIwYjgtNGI4Zi1iMmZjLWUzOWM4M2MzYTg3OQ.jpg

Your check list

Server Hardware

Total Elements (Nodes /Interfaces/ Volumes)  been polled per server

Check free disk space on the Orion Server and SQL server

Check Your Server Polling  Rate

SQL Server / Orion DB Size / Settings / Options

Check SQL Server Disk Performance

Orion Antivirus directory exclusion

Webpages Customization

Lets Go!

Collect System diagnostics as below.

Navigate to Start -> SolarWinds Orion -> Documentation and Support

Launch the gray icon for Orion Diagnostics.> Click "Start"

This program will generate a .zip file as output.

( Unzip in a folder ) Right Click >  Select Extract Here .

Server Hardware

Lets check your System Hardware first if this even near to the Solarwinds MINIM recommended.

Go to the SystemInformation folder > Open the SystemInfo.txt file

pastedImage_2.png

You will be able to find the System hardware specification  below is an example where system is only assigned with 2 Physical CPU Sockets /

below is an example where the system is only assigned two CPU PHYSICAL SOCKETS which is below Solarwinds MINIMUM recommendation .

You must have to have MINIMUM 4 PHYSICAL CPU SOCKETS here .

System Type:               x64-based PC

Processor(s):              2 Processor(s) Installed.

                           [01]: Intel64 Family 6 Model 45 Stepping 7 GenuineIntel ~1600 Mhz

                           [02]: Intel64 Family 6 Model 45 Stepping 7 GenuineIntel ~1400 Mhz

                          Total Physical Memory:     49.082 MB

                          Available Physical Memory: 39.408 MB

                          Virtual Memory: Max Size:  56.250 MB

                          Virtual Memory: Available: 45.376 MB

                          Virtual Memory: In Use:    10.874 MB

Now open the SysInfo.csv file and check the further current CPU load on the System and CPU GHz level .

pastedImage_1.png

Below in an example where the CPU load in around 70% on the current system due to two main reasons .

Parameter

Value

OSVersion

Windows Server 2012 R2 (Microsoft Windows NT 6.2.9200.0)

CPUInformation

Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz

CurrentCPUUssage

70 %

TotalPhysicalMemmory

49152 MB

FreePhysicalMemmory

39802 MB

FreeVirtualMemmory

45843 MB

FreeSpaceInPagingFiles

7109 MB

CurrentTimeZone

xxxx Standard Time (UTC+01:00:00)

Low Physical Sockets assigned

Low CPU power less then 3.0 GHz

You should be able to see MINIMUM 4 Physical Processors  Sockets as below .

Strongly recommend :  NOT to use lower then 3.0 GHz processor you will never get the performance what you are looking for even the Host and Guest wont show the CPU is busy .

Most likely you will see CPU spikes / Orion Services consuming High CPU and Memory . Once you will move the same VM to higher then 3.0 GHz process all the above symptoms will be resolved.

With lower then 3.0 GHz processor there might be other issues such as SQL Server TCP connections TimeOut Errors and High amount of data  stored under MSMQ on the system .

pastedImage_0.png

Make sure you have MINIMUM 3. Ghz host with Hyperthreading Active it will improve the Guest performance significantly and you will have full performance out of Solarwinds application

pastedImage_2.png

This is how you setup your VM in ESX

pastedImage_0.png

Here is an Example when you assign the numbers of CPU SOCKETS to the VM

pastedImage_0.png

System Model:              VMware Virtual Platform

System Type:               x64-based PC

Processor(s):              4 Processor(s) Installed.

                           [01]: Intel64 Family 6 Model 15 Stepping 1 GenuineIntel ~3493 Mhz

                           [02]: Intel64 Family 6 Model 15 Stepping 1 GenuineIntel ~3493 Mhz

                           [03]: Intel64 Family 6 Model 15 Stepping 1 GenuineIntel ~3493 Mhz

                           [04]: Intel64 Family 6 Model 15 Stepping 1 GenuineIntel ~3493 Mhz

BIOS Version:              Phoenix Technologies LTD 6.00, 4/14/2014

If you are on HyperV can adjust VM Sockets under NUMA  for more details please see below posts

Hyper-V Design for NUMA Architecture and Alignment - | Exit | the | Fast | Lane |

https://www.starwindsoftware.com/blog/a-closer-look-at-numa-spanning-and-virtual-numa-settings

image

pastedImage_1.png

Further check how much memory is assigned and available for the system and check the TaskManager which application is consuming high memory .

In above case the System Hardware is not even near to the recommended SolarWinds production deployment therefor the CPU load will remain high therefor System resources.

The following table lists minimum hardware requirements and recommendations for your SolarWinds Orion server.

Installing multiple SolarWinds Orion Platform products on the same computer may change the requirements.

Hardware requirements are listed by SolarWinds NPM license level.

NPM hardware requirements

These minimum requirements are for the Orion Platform. Products that run on the Orion Platform may have different requirements, such as different OS or memory requirements.

Consult your product-specific documentation for the exact requirements.

Hardware

SL100, SL250, SL500

SL2000

SLX

CPU speed

Quad core processor, 2.5 GHz or better

Quad core processor, 2.5 GHz or better

Quad core processor, 3.0 GHz or better

For more details see below guide

NPM 12.0 system requirements - SolarWinds Worldwide, LLC. Help and Support

Check free disk space on the Orion Sever and SQL Server

Make sure you have Good free space available on the Orion Server disks C Drive and installed Directory .

Make sure you have Good free space available on the SQL Server where the actual DB is stored.

Total Elements (Nodes /Interfaces/ Volumes)  been polled per server

Go to folder "DB" > Open file "AllEngines.csv"

pastedImage_4.png

Check how many Elements you are polling per server

EngineID

Elements

Nodes

Interfaces

Volumes

1

15828

934

6823

1071

2

16084

202

1305

77

With only SolarWinds SLX license you can montior up to 12000 Elements and beyond this you will need an Additional Polling Engine to monitor.

More Details see the Server Sizing guide .

Server Sizing recommendations

Use additional polling engines for 12,000 or more monitored elements

If you plan to monitor 12,000 or more elements, SolarWinds recommends that you install additional polling engines on separate servers to help distribute the work load.

I would also strongly advise you to check the blog post for any other questions if you are polling beyond 12000 Elements with single SLX Server.

Boost your server polling capacity with Stackable Poller license

Multi-module system guidelines

Check your Server Polling Rate

Go to Settings > Polling Engines .

Check if any of the Polling Rate is increased ?

Make sure none of the Polling Rate exceeded above 100%

POLLING COMPLETION100
ELEMENTS225
NETWORK NODE ELEMENTS18
VOLUME ELEMENTS50
INTERFACE ELEMENTS157
POLLING RATE2% of its maximum rate.
ROUTING POLLING RATE0% of its maximum rate.
HARDWARE HEALTH POLLING RATE0% of its maximum rate.
VIM.VMWARE.POLLING2
F5 POLLING RATE0% of its maximum rate.
WIRELESS HEAT MAP POLLING RATE0% of its maximum rate.
WIRELESS POLLING RATE0% of its maximum rate.
UNDP POLLING RATE0% of its maximum rate.
SAM APPLICATION POLLING RATE                       170% of its maximum rate.             

If you have any polling rate increased above the 100% you will notice high CPU / Memory Utilization on the System which could effect the System and application Performance .

Orion DB Size and settings

Go to the DBInfo Folder > Open DatabaseInfo.csv file

pastedImage_0.png

Check the Database Recovery Mode

Check the Total Database Size

Default DB Recovery should be SIMPLE (Strongly recommended)

name

db_size

status

SolarWindsOrion

889274.25 MB

Recovery=FULL

  • I have a very large Orion database. What should I check?
  • I have database performance issues. How can i improve my database performance?
  • Why is my Orion database growing so quickly?
  • Why does my Orion performance decrease each day?
  • Why is my Orion Web Console loading data at a slow rate?

For more details please see the post below and follow all the steps one by one to check your Orion Database Health and settings.

This guide will help you address the most common questions and issues related to the Orion database performance check and configuration without using the SolarWinds Database Administrator (DBA).

Quick Orion database health check guide

Check SQL Server Disk Performance

Orion Antivirus directory exclusion for NPM

Web pages recommended settings

Still have any question / need assistance ?

main.jpg

Please feel free to submit a new support ticket in relation to your question/error. Our support lines are available 24/7.

http://www.solarwinds.com/support/ticket

You can also contact the support by 24/7  phone support .

http://www.solarwinds.com/company/contact.aspx

Comments
Antivirus directory exclusions for NPM

Last Updated: February 17, 2017

Overview

To run SolarWinds products, you may need to exclude certain files, directories, and ports from anti-virus protection. This topic also lists service accounts that should be added for optimal performance and to allow all Orion products  the access to required files.

https://support.solarwinds.com/Success_Center/Network_Performance_Monitor_(NPM)/NPM_Administrator_Gu...

This is very helpful info, nicely put together Malik.

Thank you for your valuable feedback .

Was just facing this issue with a client the other day!!! Thanks for putting this together!!!

Thank you for the feedback

Is this article as adamant about physical sockets over cores as is written? It would appear to me that all instances of "physical socket" could be replaced by "non-hyperthreaded core" and be most accurate. Even the 12.1 and 12.2 systems requirements guides give recommendations in cores. I do appreciate the troubleshooting breakdown.

If you belive in the theory of constraints then improving a system at any point other than the constraint is pointless as you will become backlogged on that constraint wherever it is.

So, how can I measure the performance of the components of Orion to determine the constraint in the system.

In that way, given the taxpayer dollars I have to spend, I can tell which ONE element should I spend the money on for an improvement in the system

You can use the SAM for your individual tracking on each Orion Services in order to determine which area you need to address.

https://thwack.solarwinds.com/docs/DOC-170557#start=25

111.png

Sorry you question is unclear can you please explain what exactly your concern is  ?

My question is about allowed configuration of a VM. Mutliple times you differentiate betweeen sockets and cores and say, "You must have to have MINIMUM 4 PHYSICAL CPU SOCKETS here ." I can not think of any circumstance under VMWare that the OS, and thus Orion, actually differentiates between sockets and cores, only total number of cores (eg, 8sockets x 1core is as good as 1sockets x 8core). I believe the document could be improved by replacing all instances of "physical CPU socket" with "non-hyperthreaded core".

Richard, you may find value in the Hubble tool.

Hubble is a tool for analyzing web server response time, it helps you identify where your performance problem is.

https://support.solarwinds.com/Success_Center/Network_Performance_Monitor_(NPM)/Turn_on_and_use_Hubb...

This has been helpful. We have close to 200 users in SolarWinds during the day, and have 13 servers total in our Orion system. I've been using this as a guide to right-size our primary server. One question I have is regarding the additional web console server. Our AWS server is (seems like) a monster, but it's practically asleep (CPU and memory utilization), and carries 150 + users throughout the day. I have key perfmon counters from the top down, and I don't see any issues with my SQL server or Storage / disk IOPs / latency, however that AWS is slow as molasses. My DBA group swears the database server is rock solid, and the DC folks swear the storage is also not the problem. The counters I'm looking at in my Perf Analysis charts seem to back up their opinion.

I'm struggling with anecdotal evidence that 3GHz will solve my problem when I can't find any perfmon counters that substantiate my need to go to the well for more money. This one (similar to my primary) uses a 10 core 2.2 GHz Intel Xeon E5-266- V2 proc, and it scores 13286 on the passmark test, which is almost double the recommended score.

Any suggestions on what counters to look at that support upgrading this AWS server to 3GHz? What might I be missing? The primary server CPU has several cores sitting at 100% most of the day, and is practically unusable. When you restart the Orion services when it stops responding, it's pretty quick for a couple of hours, but the AWS is always slow. Because the CPU is pegged, this is an easy ask, but I'm struggling with what to do with our AWS server. My preference is to have people use that server over hitting our primary, but I'm having trouble getting people to adopt Orion as their go-to tool because it isn't usable, resulting in different groups maintaining their own tools for monitoring.

Any additional help or suggestions are greatly appreciated.

If your Additional Web Server is nowhere close to 100% utilization, then you will likely see zero benefit from adding additional CPU resources to that server. Website performance is primarily bound to SQL database server performance, but it can (and is) similarly bound to the performance of the primary/main Orion server. For large environments, like those with numerous Additional Polling Engines, we typically recommend moving all polling off the primary Orion server and distributing that load the Additional Pollers. This is because the main Orion server has more than enough things going on with alerting, SWIS, Pub/Sub RabbitMQ, etc. etc. that polling and servicing web requests that can be farmed out to other servers is advisable.

This helps. I've always been curious how much of the primary engine performance would or could impact the AWS response times.

Case Study : Recently fixed this issue

Error: Unexpected Website Error while adding Node: (Settings > All Settings> Manage Nodes > Add Node )


All other pages were working perfectly fine

pastedImage_0.png

Service was unable to open new database connection when requested.

SqlException: Connection Timeout Expired.  The timeout period elapsed while attempting to consume the pre-login handshake acknowledgement.  This could be because the pre-login handshake failed or the server was unable to respond back in time.  The duration spent while attempting to connect to this server was - [Pre-Login] initialization=3; handshake=20004;
Connection string -

There was an error updating the Engine Keep alive record

Error Detail-System.Data.SqlClient.SqlException (0x80131904😞 Execution Timeout Expired.  The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.ComponentModel.Win32Exception (0x80004005😞 The wait operation timed out

You can find the Errors cause and reason under below log file (OrionWeb.log)

C:\ProgramData\SolarWinds\Logs\Orion\OrionWeb.log

2019-03-25 08:12:44,762 [43] (15) ERROR SolarWinds.HardwareHealth.Web.UriConverters.HardwareItemUriConverterBase - (null)  There is no navigation part
*** Assembly App_Web_2xr4lyht, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null, .NET version v4.0.30319 ***
*** Assembly SolarWinds.Orion.Discovery.Contract, Version=2017.3.5300.1920, Culture=neutral, PublicKeyToken=null, .NET version v4.0.30319 ***
*** Assembly App_Web_lracp0mr, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null, .NET version v4.0.30319 ***
*** Assembly SolarWinds.SysMan.Utils, Version=1.1.0.34, Culture=neutral, PublicKeyToken=null, .NET version v4.0.30319 ***
2019-03-25 08:16:10,293 [106] (16) ERROR ASP.global_asax - (null)  Application_Error(6cf772452a184d5f88666d9256ed784f)
System.Web.HttpException (0x80004005😞 Request timed out.

*** Assembly App_Web_q01ln3ae, Version=0.0.0.0, Culture=neutral, PublicKeyToken=null, .NET version v4.0.30319 ***

You might see this issue under Solarwinds Information Service logs as well.

2019-03-25 08:49:40,285 [29] (18) ERROR SolarWinds.InformationService.Contract2.InfoServiceProxy - (null)  Error closing exception.
System.ServiceModel.CommunicationObjectFaultedException: The communication object, System.ServiceModel.Channels.ServiceChannel, cannot be used for communication because it is in the Faulted state.
*** Assembly SolarWinds.IPAM.Client, Version=4.6.5300.924, Culture=neutral, PublicKeyToken=null, .NET version v4.0.30319 ***



Resolution :

Increased the CPU Sockets on the System which resolved the issue.

as well as following steps improved performance and resolved other application performance side issues

NPM 12.4 Requirements

Check Free disk space on the Orion server and SQL server

Tweaking performance of Windows Server

Move SQL Server installed on the same Orion server VM (Not Recommended for production deployment )


Kindly check your System Hardware before you upgrade and make sure its matching MINIMUM recommendation .

Please also use 3.4 GHz CPU instead of lower then 3.0 GHz .



sagar.bdefieguy​  gangadhar.k

Thanks for the article

You should never oversubscribe virtual CPU sockets. When you rightsize a VM, the socket count should never exceed the actual socket count available on the host.

So unless your physical host server has 4x CPU sockets, the socket count should remain at 2.

Well  logical cores be considered in parallel . Can be added .
Reason most of the time they can be because during typical CPU operation you will almost never see continuous execution of a single thread on every clock cycle - there are always gaps when one logical core is waiting for something and the second logical core can kick in and do its job.

There is no issue configuring this side and considering so there is no hard and fast rule on this  . The best suitable for you is based on the environment you have and where it get stable needs to consider as well.

Just resolved a case :

Customer issue:

CPU increases gradually and is resolved after a restart.

Customer environment

Currently we have high CPU utilization.We have 16 vCPUs (E5649 2.53 GHz)

System Manufacturer:       VMware, Inc.

System Model:              VMware Virtual Platform

System Type:               x64-based PC

Processor(s):              2 Processor(s) Installed.
                           [01]: Intel64 Family 6 Model 37 Stepping 1 GenuineIntel ~2533 Mhz
                           [02]: Intel64 Family 6 Model 37 Stepping 1 GenuineIntel ~2533 Mhz
BIOS Version:              Phoenix Technologies LTD 6.00, 9/21/2015

Windows Directory:         C:\Windows

System Directory:          C:\Windows\system32

Boot Device:               \Device\HarddiskVolume1

System Locale:             en-us;English (United States)

Parameter Value
OSVersionWindows Server 2016 (Microsoft Windows NT 6.2.9200.0)
CPUInformationIntel(R) Xeon(R) CPU           E5649  @ 2.53GHz
CurrentCPUUssage93%

pastedImage_0.png

Resolved:

Adding 2 more CPU SOCKETS to the VM and moved the VM to 3.4 Ghz  ESX host -

Issue resolved and system is back on stable condition verified

Version history
Revision #:
1 of 1
Last update:
‎12-11-2016 03:20 AM
Updated by: