cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 11

Orion NPM Architecture, Speed, and SQL

In my last post, "What NPM Tips and Tricks do You Have?" I asked about tips and tricks, expecting a mashup of different things from all over the NPM world and to a certain extent that's what happened. Interestingly, however, a large section of the thread turned into a discussion about two things: maps and speed.

There were certainly a lot of good map tips, and you can find more at Solarwinds Labs.  In fact, you can even find out how to make your boss happy with a Big Green Button.

The speed issue is particularly intriguing to me since there are a lot of times where, let's be honest here, NPM is a bit of a dog when it comes to response. The web interface is notoriously slow, and gets even worse when you have a ton of custom widgets, do-dads, and whatchamacallits loading on a screen. Several people mentioned that a lot of speed can be picked up by getting in at the database level and pre-packaging certain things.

ZachM wrote:

Stored Procedures and custom Views created in the DB save us countless man hours and, in my experience, working directly in the DB can really expand your knowledge of the architecture of NPM overall. I highly recommend every SolarWinds engineer to challenge themselves to learn more SQL. I am by no means a DBA, but I can pull every bit of data you can get from the website, and I can do it faster 90% of the time.

NPM is an incredibly flexible and extensible product, especially in recent revisions, and offers a lot of opportunity for people willing to really dig in behind the scenes. As usual, I have more questions:

* What SQL version and architecture are you using (separate database, named instances, etc.)?

* What architecture have you found helps in the speed department?

As an example of what I'm interested in: we run Cisco UCS servers, with VMware as the hypervisor layer, backed by NetApp FAS3240 fully licensed arrays, with Flash Cache, etc. We tier our storage manually and have full production SQL and Oracle instances virtualized.  The storage is connected to the UCS with an aggregated 80GB, and the UCS to the core at 160GB.

35 Replies
Level 8

Out of interest what's the largest configuration of SW's people have got. We have performance issues on a regular basis and it always appears to be around the Main poller services and a restart resolves this.

Our platform is made up of the following:

2-Node SQL cluster (Physical Box, Windows 2012, 2.20Ghz (2 sockets, 16 cores, 32 logical processes), 128GB Mem - SQL Limited at 112GB)

1 Primary Poller, (Physical Box, Win 2012,  2.00Ghz (2 sockets, 12 cores, 24 logical processes), 128GB Mem)

8 additional Pollers (Made up of Physical boxes spec'd as above and numerous VM's)

2 Web Servers

DPA (installled on the Primary Poller)

SRM

VIM.

0 Kudos
Level 21

For those of you experiencing problems and have tried reindexing your database; here are the SQL commands that SolarWinds provided me that I have been using to maintain my database and they have been working extremely well.  The first 3 will delete all Syslog and Traps to help speed up the following commands so if you heavily rely on those you can skip the first 3 commands.  The 3rd one reindexes the database and the last one cleans up a bunch of orphans.  Before you do any of this I would recommend you take a backup of your database.

--SQL Query--
TRUNCATE TABLE SYSLOG

--SQL Query--
TRUNCATE TABLE TRAPS

--SQL Query--
TRUNCATE TABLE TRAPVARBINDS

--SQL Query--
Exec sp_msForEachTable @COMMAND1= 'DBCC DBREINDEX ("?")'

--SQL Query--
SET NOCOUNT ON;
DECLARE @tablename VARCHAR(128);
DECLARE @execstr VARCHAR(255);
DECLARE @objectid INT;
DECLARE @indexid INT;
DECLARE @frag decimal;
DECLARE @maxfrag decimal;
-- Decide on the maximum fragmentation to allow for.
SELECT @maxfrag = 1.0;
-- Declare a cursor.
DECLARE tables CURSOR FOR
SELECT CAST(TABLE_SCHEMA AS VARCHAR(100))
+'.'+CAST(TABLE_NAME AS VARCHAR(100))
AS Table_Name
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_TYPE = 'BASE TABLE';
-- Create the table.
CREATE TABLE #fraglist (
ObjectName CHAR(255),
ObjectId INT,
IndexName CHAR(255),
IndexId INT,
Lvl INT,
CountPages INT,
CountRows INT,
MinRecSize INT,
MaxRecSize INT,
AvgRecSize INT,
ForRecCount INT,
Extents INT,
ExtentSwitches INT,
AvgFreeBytes INT,
AvgPageDensity INT,
ScanDensity decimal,
BestCount INT,
ActualCount INT,
LogicalFrag decimal,
ExtentFrag decimal);
-- Open the cursor.
OPEN tables;
-- Loop through all the tables in the database.
FETCH NEXT
FROM tables
INTO @tablename;
WHILE @@FETCH_STATUS = 0
BEGIN;
-- Do the showcontig of all indexes of the table
INSERT INTO #fraglist
EXEC ('DBCC SHOWCONTIG (''' + @tablename + ''')
WITH FAST, TABLERESULTS, ALL_INDEXES, NO_INFOMSGS');
FETCH NEXT
FROM tables
INTO @tablename;
END;
-- Close and deallocate the cursor.
CLOSE tables;
DEALLOCATE tables;
-- Declare the cursor for the list of indexes to be defragged.
DECLARE indexes CURSOR FOR
SELECT ObjectName, ObjectId, IndexId, LogicalFrag
FROM #fraglist
WHERE LogicalFrag >= @maxfrag
AND INDEXPROPERTY (ObjectId, IndexName, 'IndexDepth') > 0;
-- Open the cursor.
OPEN indexes;
-- Loop through the indexes.
FETCH NEXT
FROM indexes
INTO @tablename, @objectid, @indexid, @frag;
WHILE @@FETCH_STATUS = 0
BEGIN;
PRINT 'Executing DBCC INDEXDEFRAG (0, ' + RTRIM(@tablename) + ',
' + RTRIM(@indexid) + ') - fragmentation currently '
+ RTRIM(CONVERT(VARCHAR(15),@frag)) + '%';
SELECT @execstr = 'DBCC INDEXDEFRAG (0, ' + RTRIM(@objectid) + ',
' + RTRIM(@indexid) + ')';
EXEC (@execstr);
FETCH NEXT
FROM indexes
INTO @tablename, @objectid, @indexid, @frag;
END;
-- Close and deallocate the cursor.
CLOSE indexes;
DEALLOCATE indexes;
-- Delete the temporary table.
DROP TABLE #fraglist;
GO

0 Kudos

When you run these commands, do you have Orion and all it's pollers shutdown?

0 Kudos

No, everything is running when I execute these commands.

0 Kudos
Level 9

Even though this thread is long dead it turns up on first page Google so rather than starting a new thread I figured I would add my experience with speed issues around the web console, be *VERY* aware that this was just our experience and you will have to assess against your install but considering how much pain this issue was hopefully it'll save someone their sanity.

* You can run queries on all nodes on your summary page, we were told it'd slow down performance significently

* Solarwinds constant insistence that 'Server is too slow' is their way of saying they don't know what the issue is. That said more RAM on Web Console server is always good

* Even though 11.5.2 was a massive pain to upgrade to it helped with performance (I know, Im surprised as well)

* Just because Database Maintenance should be performing re-indexing doesn't mean it does, this single command pretty much resolved our performance issue, one of the Solarwinds engineers suggested it. Page loads went from 25 seconds per node to pretty much 0-3 seconds

Exec sp_msForEachTable @COMMAND1= 'DBCC DBREINDEX ("?")'

* You'll have seen this before but yes definately exclude AV from scanning Solarwinds directories, it does make a difference

* Check any servers for disk paging especially on SQL servers, perfmon with disk IO stats can help identify issues where the DB is waiting on the disk

* You can virtualize all servers at least at our levels (11,000 elements / NPM+NCM+NTA3+VNQM), I understand why Solarwinds don'e like it due to contention for things like disk IO and over-subscription and we were told over and over not to do it but it works without issue for us

* Orion Hubble is always good to check out if you want to know the web console page load breakdown, especially for the SQL queries

As always YMMV

0 Kudos

I've been fighting this for over a year.  My use of Orion has suffered as a result.  About once a month I spend a day trying to optimize with minimal results.  Some type of a healthcheck/benchmark process would be helpful.  I received the same "it is your hardware" response from support.  Well, I keep throwing vCPUs and RAM at it, and I even have it running on a DAS SSD array to no avail.  I have Exchange and SharePoint running in the same private cloud and those are fine.  We have a great VMWare infrastructure with good SANs and hosts so I'm pretty convinced at this point it is a DB/Config/bad coding issue as you suggest.  Maybe part of this problem is the age of my install.  I'm going on almost 7 years and all of those upgrades/migrations (everything I'm running is latest ver) must have left a lot of detritus in the DB which can't be helping.  I'm about ready to start over, but I don't want to lose my history.

I tried your DB reindex command, seems to have helped a bit but still lagging.  I've rebuilt my DB Server to 2012R2 with SQL 2014, but perhaps time to rebuild the NPM Web/Polling Server also.

It would be nice if SW Support was more willing to assist with performance issues.  I think SAM is the main culprit.  Once that started being extended, things went downhill from there.  As you can see, I do not have a large install:

Orion

Module Name Orion Platform

Version 2015.1.2

Service Pack None

Nodes currently monitored 179

Total nodes in license 1210

Volumes currently monitored 190

Total volumes in license 1210

NPM

License Production

Product Name Network Performance Monitor

Version 11.5.2

Service Pack None

Current number of interfaces 459

Allowed number of interfaces 500

SAM

Product Name Server & Application Monitor

Version 6.2.1

Service Pack None

License Production

Allowed Number of Component Monitors 700

Total Number of Component Monitors 376

Licensed Component Monitors 376

Unlicensed Component Monitors 0

Available Component Monitors 324

Toolset

Product Name Toolset

Version 11.0.1

Service Pack None

License Production

Number of Activated Licenses 2

Nodes in License(s) 10

Volumes in License(s) 10

Seats in License(s) 2

Seats Taken 2

Seats Available 0

IVIM

License 

Product Name Integrated Virtual Infrastructure Monitor

Version 2.1.0

Service Pack None

Allowed number of sockets 0

License Type Unknown

DPA

Product Name Database Performance Analyzer

Version 9.2.0

Service Pack None

Packet Analysis Sensors

License . Free Commercial

Purchase Additional Licenses

Version 2.0

Service Pack None

Server Packet Analysis Sensors currently used 4

Total QoE Server Sensors 10

Network Packet Analysis Sensors currently used 0

Total QoE Network Sensors 1

NTA

License Production

Module Name NetFlow Traffic Analyzer

Version 4.1.1

Service Pack None

0 Kudos

How much RAM do you have assigned, how many servers for the roles?

One thing we did (But in our case was ultimately made no difference) was to track the issue down ourselves.

* Enable Hubble

* Confirm where the execution delay is

* If issue is Database confirm server isn't maxed on the usual (CPU, memory)

* Drill down to Disk IO (We use Newrelic for this)

* Drill down to the split up for the IO (Read, write, etc)

* Look at what is happening at the IO (Is it blocked, queued, in some other wait state?)

The above might help give you at least a localisation for the issue although this did lead us down the wrong pat of thinking our SAN was oversubscribed and Solarwinds continual insistence it was the issue didn't help. I enlisted our Infrastructure team and had them look at the SAN directly and we tracked the IO performance where it leaves the Compute and traverses to the Storage and could see no issue.

Overall in my experience with Solarwinds the following is a *general* trend

* CPU is always high (At least on our servers as we don't have infinite vCPUs)

* If you are near capacity for RAM you WILL see issues (Probably paging to disk)

* If you do something stupid (Massive queries, complex views) it will let you, it's flexibility can be it's undoing if you don't understand what it's doing

* Queries can sometimes go nuts, have seen it happen more so on 11.5.2 where I assume SQL ends up blocked

There is a performance integration function in the Administration area but I haven't touched it, not sure if its worth turning on?

I have a very similar issue, although larger install with NPM, SAM, IPAM, NCM, Toolset, NTA, UDT and WPM.  My main server is  32 core, 32Gb hardware beast and my 3 additional pollers are VM and have 4 cores and 8Gb (being upgraded to 10Gb and 6 cores tomorrow).

We have constant issues, low virtual memory requiring reboots, services stop, you name it.

I am getting another poller to see if that helps, but support says we look great.

Our DB response time seems very slow.  We also have had this system upgraded since version 8.

A total rebuild would be impossible with the hundreds of custom apps, UnDP's and who knows what.

I will try the reindex as well.

Update any new findings here and maybe we can track down our slowness  🙂

0 Kudos

How did this go, have you any further issues?  I would recommend at least 12Gb RAM if not 16..

0 Kudos

Have you tried changing how the SQL client connects to the SQL server. I have had pretty good success with that setting.

What aspect of Orion/NPM do you find most annoying?

Level 9

Hello Everyone,

Is it better to have a dedicated server for Solarwinds DB, or is it OK if we will put the DB into the cloud and this DB will be joining a lot of virtual machines.

One factor I'm against the cloud is that the I/O  for my DB will be affected since a lot of VMs also run in our private cloud.

0 Kudos

Go with a dedicated server. Orion and the SQL databse are latency sensitive so communicating to the cloud would be a really bad idea.

Since the cloud made its debut in our team, they want to throw every system into that cloud as vm I really feel that Solarwinds will suffer greatly if its DB will be included in the cloud knowing that Solarwinds is a very critical tool to our operations.

0 Kudos

boomshine, I recommend not to include your DB on the cloud to not affect the performance of your Orion. Maintain your SQL Server database on a separate physical server

rickrocks appreciate your input

Level 21

Having fast hardware and storage throughput is also part of our strategy as many others have also suggested.  We also use a separate dedicated web server and also try not to overload the platform system by offloading more polling to dedicated polling engines.  Syslog and traps can also kill your system, I am sure to keep a close eye on how both the syslog/trap rules and capacity are impacting my system.

We are running our entire architecture on Windows Server 2008 R2 in a completely virtualized environment (VMWare) with a SQL Server 2008 R2 using MS Failover Clustering.  Within the next month we will be moving the database to a dedicated system with a 1.5 TB Solid State RAID array but still virtualized for the flexibility.

Level 13

Just uped my sql server to 24GB of ram from 12gb, It helped a bit.  SQL Specs:

sql.PNG

Web server specs:

polling.PNG

Sometimes the site is quick,but for the most part it is fairly slow

According to appinsight:

compilations/recompilations are high (1.06)

work files created/sec are high (65.93)

worktables created/sec are high (19.49)

0 Kudos

That looks like a lot of memory. Is your db server hosting more than the Orion NPM database? Are they VMs? Might want to look through Resource Monitor on your SQL server to find out how much memory it's using, and adjust accordingly. (I'm always suspicious of VMs with high memory requirements.)

One other trick to speed up Orion NPM is to remove all of the thwack and community elements from each page. Unless you find them useful, you're making calls to an external site everytime you load a page with those elements enabled. I always get rid of them (but don't worry, I have Thwack opened in a browser tab ) and it does make a difference in page load times.

0 Kudos

Its a physical box dedicated to just solarwinds (I hijacked a old exchange mailbox server ) I did remove the thwack web parts. The part that gets me is sometimes its really fast. Just the other day we were on it during a meeting and myself along with another admin couldn't believe how fast it was!

I will try to get some solid states ordered, since we spent so much on all the different sw parts, why not spend a few more thousand for new hardware.

0 Kudos
Level 17