cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
MVP
MVP

Orion Web Server Scaling

Jump to solution

Hi Thwacksters,

Has anyone dealt with an environment where there are over 100 Orion web console users that access the monitoring platform (inclusive of NPM, NTA, NCM, VNQM, IPAM, UDT, SAM, WPM, SRM & DPA) ?

The users are segregated as Network, Server, Application, DB and Platform users. Each of the teams has separate users that need to monitor their respective devices.

We have split the users amongst 2 web servers - one sitting on the main platform and an Additional Web server. The main platform does not run the SQL DB services - the Orion database has been built on a separate server.

We observe that the CPU utilization on the core Orion platform reaches 100% very frequently. We are required to Stop and restart the Orion services frequently to get the CPU utilization down.

It maybe noted that we have followed the best practices in terms of providing more than the recommended CPU, RAM, Disk IOPS for the Core platform, the database server as well as the Additional web server not to forget the 6 Additional Polling engines.

Any pointers to articles that may help us design a better Web Access mechanism to the Orion monitoring are most welcome.

regards

gangadhar.kcsameerc_sameerabdhijasharma

Labels (2)
0 Kudos
1 Solution
Level 10

The bigest limitation of the webserver is the fact that its 32 bit only so once the proccess passes 1.5 GB it usualy chruses. This could be anyware from 1 user to 50 user depenting on the types of views they use.

You mentioned you have SAM , so i would suggest you add the IIS APP insight monitor on the aditional webserver and main server to evaluate exacly how many unic connection you have and what the RAM usage is for IIS.

Here are a couple of thing you can do to minimize web load:

- make sure all account have session timeout

- customize each view and eliminate unneeded widgets.

- change all charts to max 24 hours load data insted of 7 days.

- check the scheduled reports and try to move them to a reporting or BI platform running directly on the DB.

If you really have 100 concurent sessions you need at the verry least 1 more AWS. The licence is preaty cheap ( ~500 EU) the only thing is that you cant stack them so you will need a dedicated 2 CPU , 8 GB ram windows 2016 machine. You could potentialy deploy the Solarwinds AWS on the same IIS server as other applications as long as you have ~4 GB free RAM on that machine.

View solution in original post

16 Replies

Hey Marc, We are using the following: https://servername:17778/SolarWinds/InformationService/v3?wsdl this checks that the InformationService is running on the box, but there could be better ways.

- David Smith

Thanks David - I'll pass that on to my NW engineering team.

0 Kudos

The other option recommended to me by a colleague was to use a Virtual Directory and put some sort of switch file in that directory. We use that in our clusters in order to easily take servers out of the pool for maintenance. But I'm not sure if that is a company configuration or something that is easily replicated.

- David Smith
0 Kudos
Level 12

Ravik,

A couple of things, If it's feasible I would suggest investing in a second AWS, move all users off of the primary polling engine and setting up a load balancer in front of your AWS's.  We have well over 100 users who could use Solarwinds at any given time.  Here's what our SW environment looks like. The F5 load balancer is setup to round robin users to the 3 AWS's.  The only people who use the primary polling engine web services are the system admins.

pastedImage_1.png

We also just rebuilt all of our severs and we moved to a new drive model that really helped the performance of Solarwinds.  We use to just have C:/OS and D:/Programs and/or C:/OS & Web Servers and D:/Programs.

Here's how we built our systems:

Web Servers.

C:/ OS

D:/ Paging

E:/ Programs

F:/ Web

G:/ Logs

Additional Polling Engines:

C: /OS

D:/ Paging

E:/ Programs & TFTP/SFTP

Primary Polling Engine

C:/ OS

D:/ Paging

E:/ Programs & TFTP/SFTP

F:/ Web

G:/ Logs

I'm getting the message, "There was an error while rendering with id 3007" and the web console page does not fully load.

0 Kudos

I've put 2 AWS behind an F5 Load Balancer configured to use the balanced load algorithm. When accessing the web console using the F5 VIP the page doesn't fully load and I'm getting the error "There was an error while rendering resource with id 3007".

0 Kudos

I like this write up and diagram

We're currently in the process of re-designing our system and really need to improve performance...we are also using the common C drive D drive layout you were...how did the new drive layout increase performance?

You running a physical DB server? Size?

Are you running the NTA database on separate hardware? VM or Physical?

How many nodes are you monitoring?


Polling intervals and elements?

We are also pumping Traps, Syslog, and Netflow through VIPs, I don't feel like we've ever had performance issues in this realm...we filer Syslog through a central rsyslog server before we send it off to Solarwinds so that cuts down on that noise...we only alert on Emergency and Critical...we do get a lot of traps 5-6 million a day, but for both syslog and traps we only keep 1 day

The biggest issue we've ever had was letting the NCM_CacheDIFFResults table get out of hand, so now the purge job doesn't run properly anymore

0 Kudos

This is very similar to what we're scaling up to.  I'm curious about what health check mechanism the F5 is doing to ensure each AWS is healthy and able to handle requests.

Thanks!

-Keith

keith.levine.hbc​ did you ever get a health check configured for the F5 aside from a ping or main login page?

0 Kudos

As has been mentioned, as the web application is still 32-bit, this is a majorly limiting factor in its performance. Whilst the documents state ~20 concurrent users, this really does depend on how often users are loading pages and performing activities in the web interface.

Things you can do to improve things:

1. Follow KMSigma​'s excellent guide on tuning IIS - https://blog.kmsigma.com/2016/10/30/how-i-build-an-orion-server/

2. Install an additional AWS server

3. Configure an IIS Cluster or better still utilise a proper HTTP Load Balancer such as an F5, which will manage the load applied to each server. Note: it must support session management and we go for the balanced load algorithms

Once an IIS farm is built you can either add more if you still have issues, or just bring in the main Orion server to help out, but my preference is to always leave the main Orion server to perform all duties except web service.

Level 10

The bigest limitation of the webserver is the fact that its 32 bit only so once the proccess passes 1.5 GB it usualy chruses. This could be anyware from 1 user to 50 user depenting on the types of views they use.

You mentioned you have SAM , so i would suggest you add the IIS APP insight monitor on the aditional webserver and main server to evaluate exacly how many unic connection you have and what the RAM usage is for IIS.

Here are a couple of thing you can do to minimize web load:

- make sure all account have session timeout

- customize each view and eliminate unneeded widgets.

- change all charts to max 24 hours load data insted of 7 days.

- check the scheduled reports and try to move them to a reporting or BI platform running directly on the DB.

If you really have 100 concurent sessions you need at the verry least 1 more AWS. The licence is preaty cheap ( ~500 EU) the only thing is that you cant stack them so you will need a dedicated 2 CPU , 8 GB ram windows 2016 machine. You could potentialy deploy the Solarwinds AWS on the same IIS server as other applications as long as you have ~4 GB free RAM on that machine.

View solution in original post

Thanks bogdan.stan@xpo.com

Appreciate your inputs.

Working with Solarwinds Support too. They have suggested an upgrade to 2018.4. Post the upgrade we shall also implement your suggestions.

gangadhar.kcsameer

0 Kudos

I believe the scalability guidelines recommend an additional web server for every ~20 or so concurrent users.

- Marc Netterfield, Github

I would say if you have 100 users then you need at least 2 dedicated Web Servers, and leave the Primary Orion Server to perform the other functions such as polling, alerting, reporting etc.

- David Smith

Thanks David, we do have an additional web server but we use that in conjunction with the main platform web server too. We read somewhere that 1 Additional web server can cater to only about 20 users that is far too less. Does this mean that we need 5 web servers to handle 100 users? What if, each user has multiple tabs open in his browser and each tab is actively refreshing the nodes and interfaces under monitoring? Does each tab count as a separate session as each tab may have different nodes opened for monitoring?

Appreciate your help!! Thanks

Hi RaviK,

If your running NPM12.3 or later then a single web server should be able to handle 25-50 concurrent USERS, tabs don't really count as they are not loading fresh data in the background, only the active tab. I guess the question might be out of your potential 100 users, are they all logged in during the same time, or are they shift based etc? I would say get yourself another AWS and then balance your users across the two servers keeping the Primary Poller free (or at least only for your own administration.

Cheers

David

- David Smith