Hi there...
I have searched around the forums and maybe I'm missing something but wanted to ask for opinions and feedback.
We have two physical servers for Solarwinds products today which include NPM, SAM, and NCM. The system is under a lot of load and we are looking to upgrade.
I'm looking at replacing the two existing servers with newer hardware and beefy specs. Also looking to upgrade from 2008 to 2012 Server Standard Edition (plus licensing for cores etc). Hardware wise I believe I have a good grasp on things - I like to beef things up way beyond "specs" as I always find them "bare minimum".
Software wise is where I get a bit confused so hoping for some input.
The new system I'm proposing looks like this:
Main application server (web server, primary server) - Dell R720, Dual CPU (8 cores), 128G RAM, 16 X 146GB 15k SAS, RAID, Server 2012 Standard
SQL 2012 server (dedicated to only Solarwinds) - Dell R910, Quad CPU (32 cores), 256G RAM, 16 X 600GB 15k SAS, RAID, Server 2012 Standard, SQL 2012 Standard
3 X Additional Polling Engines - Dell R320, single CPU (4 cores), 64G RAM, 4 X 146GB 15k SAS, RAID, Server 2012 Standard
Obviously a fair chunk on hardware along with a significant chunk on additional licenses (Windows, SQL, and 3 polling engine licenses).
Today, we are monitoring:
| Network Elements | 5056 |
| Nodes | 1110 |
| Interfaces | 3610 |
| Volumes | 336— |
Using the above numbers, it's important to note that we poll interface every 5 minutes vs default 9 minutes. Along with some other details we have concluded that we are maxxing out our current system. We also have approximately 3500 nodes that I would like to migrate from another system over to Solarwinds, along with much more intense application level stuff we want to monitor. At some point I'd like to look at adding Netflow as well but for now we'll leave that out of the equation.
How do the polling engines function for distributing load? The docs indicate that you can run them in "auto" mode and let the system determine which poller to assign for a node. What happens when a polling engine is taken offline - does one of the other polling engines take over or are those nodes that the particular polling engine is responsible for just lose that data for that time period?
We have no failover plans at this point - need to get a solid system running first and then look at adding failover options to another location in the future.
I'm assuming that throwing lots of RAM at these servers will help with performance - anyone else find something to the contrary?
Many thanks for any input...
Paul