Without knowing the power of the systems, I'd move as much polling as I could to server 2
Then I'd look at the database performance: make sure you're running raind 10 with separate arrays for Data, Log (and System/Temp)
make sure the database servers are not synchronous (requiring every commit to work on both systems.
(honestly, alerting is not that high a load on the servers)
If you want to add an additional webserver then that might help, but only if your current one truly is bogged down by CPU, and not stuck in the database...
The system is running at a constant 50% to 100% CPU, and is one of our higher performers in our environment. "SolarWinds.DataProcessor.exe" seems to be the culprit. The SQL cluster is on a SAN backend. The problem I have with moving around the pollers is that many of our switches were setup with the first poller IP and they never added the second one. I can't move them without breaking a lot of switch. Of course, that is a battle with our network team that I need to take on at some point. I'll have to do some SQL benchmarking, but I haven't had too much of a problem with it when doing custom queries with SQL Management Studio.
You mention "additional" web server. So I can setup another server with just the web front end? That would be great because then I don't have to interrupt the current setup and I can do testing that way.
The problem I have with moving around the pollers is that many of our switches were setup with the first poller IP and they never added the second one. I can't move them without breaking a lot of switch. Of course, that is a battle with our network team that I need to take on at some point.
I note you have NCM -- surely you can push out the missing config bits to your switches using that?
Not only that, you can generate a compliance report of switches that have only one poller address, and a remediation script that will fix it, and can then zap them into compliance.
You mention "additional" web server. So I can setup another server with just the web front end?
Thank you for that! I am not a network guy, but I will talk to the network team about doing that. That would be very helpful to fix them all up.
Click on settings at the top right and click on Polling Engines. Check your polling rate. I'm willing to bet it is pretty high. Take note of the value.
Go back to settings and click on polling settings. Take a screenshot of this page so you know what the current polling intervals are. Increase the values for the "Polling Intervals" and "Polling Statistics Intervals" e.g. 120 seconds --> 300 seconds and 30 minutes -- 60 minutes. Click on "Re-apply Polling Intervals". Wait about 5 - 10 minutes and check the Polling rate on the Polling Engines screen.
Once it drops, go back to the Orion Polling Settings screen and set your Polling Intervals and Polling Statistics Intervals back to what they were.
Wait 10 minutes and check your polling rate again and monitor performance.
This worked for me and I have 1 Poller with over 11,000 elements. It dropped my polling rate from 91% to 52%. Hope it helps you in your environment.