This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Orion NPM Distributed Setup

I've recently become the admin for the Orion NPM system at the organization I work for and I am still learning the system. We have a pretty large Orion setup with approximately 13,000 elements. Our current setup is such:

Server 1:

Orion Platform 2015.1.0, SAM 6.2.0, QoE 2.0, NCM 7.3.2, NPM 11.5, NTA 3.11.0, IVIM 2.0.0

Runs web, alerting, and the primary poller (~9,700 elements)

Server 2:

Secondary poller (~3,300 elements)

Server 3:

SQL Cluster node (Windows Failover)

Server 4:

SQL Cluster node (Windows Failover)

Our first poller is getting quite bogged down and Orion is becoming very slow to launch the web front end. I am looking to separate these polling functions from the front end. More specifically, I'd like to build a new server for the front end, and possibly a new one for the alerting. Is there a recommended setup for distributing these services, and if so, can someone point me in the right direction? If not, does anyone have any tips on how it would be best to separate these services?

  • Without knowing the power of the systems, I'd move as much polling as I could to server 2

    Then I'd look at the database performance: make sure you're running raind 10 with separate arrays for Data, Log (and System/Temp)

    make sure the database servers are not synchronous (requiring every commit to work on both systems.

    (honestly, alerting is not that high a load on the servers)

    If you want to add an additional webserver then that might help, but only if your current one truly is bogged down by CPU, and not stuck in the database...

  • The system is running at a constant 50% to 100% CPU, and is one of our higher performers in our environment. "SolarWinds.DataProcessor.exe" seems to be the culprit. The SQL cluster is on a SAN backend. The problem I have with moving around the pollers is that many of our switches were setup with the first poller IP and they never added the second one. I can't move them without breaking a lot of switch. Of course, that is a battle with our network team that I need to take on at some point. I'll have to do some SQL benchmarking, but I haven't had too much of a problem with it when doing custom queries with SQL Management Studio.

    You mention "additional" web server. So I can setup another server with just the web front end? That would be great because then I don't have to interrupt the current setup and I can do testing that way.

  • jasongreb wrote:

    The problem I have with moving around the pollers is that many of our switches were setup with the first poller IP and they never added the second one. I can't move them without breaking a lot of switch. Of course, that is a battle with our network team that I need to take on at some point.

    I note you have NCM -- surely you can push out the missing config bits to your switches using that?

    Not only that, you can generate a compliance report of switches that have only one poller address, and a remediation script that will fix it, and can then zap them into compliance.

    You mention "additional" web server. So I can setup another server with just the web front end?

    This: Orion Web Server Engine – increase user access

  • Thank you for that! I am not a network guy, but I will talk to the network team about doing that. That would be very helpful to fix them all up.

  • Click on settings at the top right and click on Polling Engines. Check your polling rate. I'm willing to bet it is pretty high. Take note of the value.

    Go back to settings and click on polling settings. Take a screenshot of this page so you know what the current polling intervals are. Increase the values for the "Polling Intervals" and "Polling Statistics Intervals" e.g. 120 seconds --> 300 seconds and 30 minutes -- 60 minutes. Click on "Re-apply Polling Intervals". Wait about 5 - 10 minutes and check the Polling rate on the Polling Engines screen.

    Once it drops, go back to the Orion Polling Settings screen and set your Polling Intervals and Polling Statistics Intervals back to what they were.

    Wait 10 minutes and check your polling rate again and monitor performance.

    This worked for me and I have 1 Poller with over 11,000 elements. It dropped my polling rate from 91% to 52%. Hope it helps you in your environment.