This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Orion Web Server Scaling

Hi Thwacksters,

Has anyone dealt with an environment where there are over 100 Orion web console users that access the monitoring platform (inclusive of NPM, NTA, NCM, VNQM, IPAM, UDT, SAM, WPM, SRM & DPA) ?

The users are segregated as Network, Server, Application, DB and Platform users. Each of the teams has separate users that need to monitor their respective devices.

We have split the users amongst 2 web servers - one sitting on the main platform and an Additional Web server. The main platform does not run the SQL DB services - the Orion database has been built on a separate server.

We observe that the CPU utilization on the core Orion platform reaches 100% very frequently. We are required to Stop and restart the Orion services frequently to get the CPU utilization down.

It maybe noted that we have followed the best practices in terms of providing more than the recommended CPU, RAM, Disk IOPS for the Core platform, the database server as well as the Additional web server not to forget the 6 Additional Polling engines.

Any pointers to articles that may help us design a better Web Access mechanism to the Orion monitoring are most welcome.

regards

gangadhar.kcsameerc_sameerabdhijasharma

  • I would say if you have 100 users then you need at least 2 dedicated Web Servers, and leave the Primary Orion Server to perform the other functions such as polling, alerting, reporting etc.

  • Thanks David, we do have an additional web server but we use that in conjunction with the main platform web server too. We read somewhere that 1 Additional web server can cater to only about 20 users that is far too less. Does this mean that we need 5 web servers to handle 100 users? What if, each user has multiple tabs open in his browser and each tab is actively refreshing the nodes and interfaces under monitoring? Does each tab count as a separate session as each tab may have different nodes opened for monitoring?

    Appreciate your help!! Thanks

  • Hi RaviK,

    If your running NPM12.3 or later then a single web server should be able to handle 25-50 concurrent USERS, tabs don't really count as they are not loading fresh data in the background, only the active tab. I guess the question might be out of your potential 100 users, are they all logged in during the same time, or are they shift based etc? I would say get yourself another AWS and then balance your users across the two servers keeping the Primary Poller free (or at least only for your own administration.

    Cheers

    David

  • I believe the scalability guidelines recommend an additional web server for every ~20 or so concurrent users.

  • The bigest limitation of the webserver is the fact that its 32 bit only so once the proccess passes 1.5 GB it usualy chruses. This could be anyware from 1 user to 50 user depenting on the types of views they use.

    You mentioned you have SAM , so i would suggest you add the IIS APP insight monitor on the aditional webserver and main server to evaluate exacly how many unic connection you have and what the RAM usage is for IIS.

    Here are a couple of thing you can do to minimize web load:

    - make sure all account have session timeout

    - customize each view and eliminate unneeded widgets.

    - change all charts to max 24 hours load data insted of 7 days.

    - check the scheduled reports and try to move them to a reporting or BI platform running directly on the DB.

    If you really have 100 concurent sessions you need at the verry least 1 more AWS. The licence is preaty cheap ( ~500 EU) the only thing is that you cant stack them so you will need a dedicated 2 CPU , 8 GB ram windows 2016 machine. You could potentialy deploy the Solarwinds AWS on the same IIS server as other applications as long as you have ~4 GB free RAM on that machine.

  • Thanks bogdan.stan@xpo.com

    Appreciate your inputs.

    Working with Solarwinds Support too. They have suggested an upgrade to 2018.4. Post the upgrade we shall also implement your suggestions.

    gangadhar.kcsameer

  • As has been mentioned, as the web application is still 32-bit, this is a majorly limiting factor in its performance. Whilst the documents state ~20 concurrent users, this really does depend on how often users are loading pages and performing activities in the web interface.

    Things you can do to improve things:

    1. Follow KMSigma​'s excellent guide on tuning IIS - https://blog.kmsigma.com/2016/10/30/how-i-build-an-orion-server/

    2. Install an additional AWS server

    3. Configure an IIS Cluster or better still utilise a proper HTTP Load Balancer such as an F5, which will manage the load applied to each server. Note: it must support session management and we go for the balanced load algorithms

    Once an IIS farm is built you can either add more if you still have issues, or just bring in the main Orion server to help out, but my preference is to always leave the main Orion server to perform all duties except web service.

  • Ravik,

    A couple of things, If it's feasible I would suggest investing in a second AWS, move all users off of the primary polling engine and setting up a load balancer in front of your AWS's.  We have well over 100 users who could use Solarwinds at any given time.  Here's what our SW environment looks like. The F5 load balancer is setup to round robin users to the 3 AWS's.  The only people who use the primary polling engine web services are the system admins.

    pastedImage_1.png

    We also just rebuilt all of our severs and we moved to a new drive model that really helped the performance of Solarwinds.  We use to just have C:/OS and D:/Programs and/or C:/OS & Web Servers and D:/Programs.

    Here's how we built our systems:

    Web Servers.

    C:/ OS

    D:/ Paging

    E:/ Programs

    F:/ Web

    G:/ Logs

    Additional Polling Engines:

    C: /OS

    D:/ Paging

    E:/ Programs & TFTP/SFTP

    Primary Polling Engine

    C:/ OS

    D:/ Paging

    E:/ Programs & TFTP/SFTP

    F:/ Web

    G:/ Logs

  • This is very similar to what we're scaling up to.  I'm curious about what health check mechanism the F5 is doing to ensure each AWS is healthy and able to handle requests.

    Thanks!

    -Keith

  • keith.levine.hbc​ did you ever get a health check configured for the F5 aside from a ping or main login page?