nav[aria-label="Primary Navigation"] { padding: 0; & ul { list-style: none; width: 100%; display: flex; flex-direction: row; justify-content: start; align-items: start; gap: 30px; padding: 0; & li { margin: 0; } & ul li { list-style: none; } } }

Community
- Command Central
- MVP Program
- Monthly Mission
- Blogs
- Groups
- Events
- Media Vault
Products
- Observability
- Network Management
- Application Management
- IT Security
- IT Service Management
- System Management
- Database Management
Content Exchange
- SolarWinds Platform
- Server & Application Monitor
- Database Performance Analyzer
- Server Configuration Monitor
- Network Performance Monitor
- Network Configuration Manager
- SQL Sentry
- Web Help Desk
Free Tools & Trials
Store

MSSQL performance tuning

jlintner

Lately I've been experiencing some bad performance on our Orion NPM polling and database servers and trying different things to fix/improve it. I think we are experiencing SQL timeouts between the polling engine and SQL database. I'm not a DBA, but know some minimal stuff about SQL, so here goes.

Polling server: We are running NPM, and NCM on a dedicated server, a Dell 2950 - (2) dual core 3Ghz CPUs, 8 GB ram, Win2k3 R2, 64 bit os. We also have VoIP and NTA modules. I haven't had a chance to fully dive into the VoIP monitor but we do use the NTA to troubleshoot sites with heavy WAN utilization. We currently have about 2700 nodes/4000 elements, mostly Cisco devices.

Database server: A dedicated database server for Solarwinds, a Dell 2950 - (2) dual core 3Ghz CPUs, 16 GB ram, Win2k3 R2, 64 bit os., MSSQL 2005 Enterprise, also 64bit. The drives are configured like this:

C: RAID 1; (2) 72GB SAS drives; OS and system databases except for TEMPDB
E: RAID 1; (2) 72GB SAS drives; MSSQL log files for TEMPDB and Solarwinds related databases
F: RAID 1; (2) 136GB SAS drives; MSSQL database files for TEMPDB and Solarwinds related databases

We recently added the last two drives and seperated the database and log files, and moved the TEMPDB from the C: drive. That helped, but we still have high disk que length for the E: drive, and database activity is constant for the NPM database.

Looking for other ways to improve performance, I noticed under the DB properties for the two Solarwinds databases (NPM and NCM) that the "full-text indexing" option is checked. Some DBAs I've talked to said to turn that off and it should improve performance. Is there any requirement for the Solarwinds databases to have that enabled? I searched this forum, and the Admin guide for NPM but couldn't find anything on that setting.

We've also recently upgraded firmware and drivers for the servers/RAID controllers/NICs to help, along with reducing our polling frequency.

Some of our upper management is losing their faith with Solarwinds since we sometimes get alerts for multiple sites going down, then coming back up hours later, but other tools never see the outage. Another group here wants to put all of their servers in, about 800 more "nodes", along with UPSs - about 1000 of those. I don't think it can handle the additional load!

What else can I do to get this application working reliable? I don't think we are pushing the limits of the system, are we? I would rather not have to tell manangement that we have to spend another $20K for another poller!

HELP!

Find more posts tagged with

Accepted answers

chris.lapoint

What version of Orion NPM, NTA, and NCM are you running?

Can you find out how many flows per second (FPS) you are receiving on the NetFlow side of things? This module typically puts a fair share of load on the DB server. To get flows per second, you'll need to run Windows perfmon and add the NetFlow counter called PDU Per Sec.

Regardless of FPS, there are certainly Orion NTA performance tuning options that you can enable that should help reduce your SQL DB load (provided you are running NTA 3.5 SP2). See this KB article for details

All comments

chris.lapoint

What version of Orion NPM, NTA, and NCM are you running?

mcbridea

Your element count for NPM is well within the normal limits but I'm wondering what the NTA load is? How many NTA sources do you have and what bandwidth circuits are they monitoring? How many nodes are there in NCM?

For the SQL server you have made some good changes but you will get much improved performance by going to a RAID 10 with 6 or more spindles for the NetPerfMon database. Full-text indexing should be on for NCM but is not required. Searches and such in NCM will be faster with it.

The additional poller price is less than the primary SLX poller. Sales can get you an exact figure.

Andy

FormerMember

Looking for other ways to improve performance, I noticed under the DB properties for the two Solarwinds databases (NPM and NCM) that the "full-text indexing" option is checked. Some DBAs I've talked to said to turn that off and it should improve performance. Is there any requirement for the Solarwinds databases to have that enabled?

I'm also curious as to the answer to jlinters' question?

savell

I agree with Andy's comment regarding the number of spindles you have for the database. Our database currently works very well on RAID10 array with 6x146Gig 15k SAS drives, front-ended with a raid card with 512Meg cache (25% read and 75% write).

This handles 16,000 elements across three pollers nicely (with minimal disk queues).

I should also note, that we run the database in simple mode (i.e. without logging) - which also reduces the disk I/O, but could mean we would need to perform a restore to the last backup should we have a database failure (would need to lose two drives in the RAID10 array for this to occur however).

Dave.

jlintner

Your element count for NPM is well within the normal limits but I'm wondering what the NTA load is? How many NTA sources do you have and what bandwidth circuits are they monitoring? How many nodes are there in NCM?

I want to say the amount of NTA nodes are about 30. Most with T1 circuits, some with 2XT1. I do have two server farm switches (6509E), sending NTA data from the etherchannel ports connecting back to our core switches - security wants to put inline NIDS and they requested flow-data to see what we should expect. I've been meaning to go back and turn off the flow data from some ofthe lesser sites, for now.

I removed most of the devices from NCM today, since I wasn't fully understanding what management wanted out of this. I have a better idea and will re-populate it soon, but most likely 2000 nodes.

For the SQL server you have made some good changes but you will get much improved performance by going to a RAID 10 with 6 or more spindles for the NetPerfMon database. Full-text indexing should be on for NCM but is not required. Searches and such in NCM will be faster with it.
The additional poller price is less than the primary SLX poller. Sales can get you an exact figure.
Andy

I had a quote for an additional poller, GSA price, and that factored with server hardware, with other required licenses (OS, backup client agents, anti-virus/HIDS agents, etc.), it added up pretty quick.

We are running Orion v9.1 SP4 SLX also running the web site. I'm not currently able to get to the servers (home) to verify the other app/versions, but they were all updaed at the same time when NPM went to 9.5 SP3.

It sounds like SAN space for the database is the answer, with RAID 10 (I would've preferred that but when the server was originally stood up, it wasn't an option), six spindles. I'll start working on that change request. I've been making other minor changes to try and improve this, but I want a solid answer - a final "this will fix Solarwinds" answer I can take to upper management. Looking toward the future, how much SAN space should I request? 200GB?

So you say turning off the full-text indexing won't help too much? What else like polls per second/polling frequency, etc can I look at? What about anti-virus live scanning exclusions?

mcbridea

Hi jlitner,

Virus scan can cause major performance issues, especially with live scanning. I'd look at that and increasing spindles rather that an additional poller. BE careful with the SAN. We have customers with mixed results with SANS as they are not engineered for the very rapid read/write performance NTA requires.

jlintner

Thanks for the reply Andy. We turned off the CBQoS feature for NTA and that made the largest improvement in regards to our database performance. I have been working with the SAN team here and they are working on allocating SAN space for the databases. Since they know SANs, we will leave the actuall RAID design to them (they keep saying as long as the SAN cache memory can keep up with the server requests, then we shouldn't worry about the back end RAID configuration. I told them as long as we get good performance and they can migrate to data to better performing disk groups as needed, then I don't have any problem with that). The SAN/database changes should happen next week.

Otherwise, things are currently under control with one polling server, but you may want to look at support case: 129852.