This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Orion NPM Architecture, Speed, and SQL

In my last post, "What NPM Tips and Tricks do You Have?" I asked about tips and tricks, expecting a mashup of different things from all over the NPM world and to a certain extent that's what happened. Interestingly, however, a large section of the thread turned into a discussion about two things: maps and speed.

There were certainly a lot of good map tips, and you can find more at Solarwinds Labs.  In fact, you can even find out how to make your boss happy with a Big Green Button.

The speed issue is particularly intriguing to me since there are a lot of times where, let's be honest here, NPM is a bit of a dog when it comes to response. The web interface is notoriously slow, and gets even worse when you have a ton of custom widgets, do-dads, and whatchamacallits loading on a screen. Several people mentioned that a lot of speed can be picked up by getting in at the database level and pre-packaging certain things.

ZachM wrote:

Stored Procedures and custom Views created in the DB save us countless man hours and, in my experience, working directly in the DB can really expand your knowledge of the architecture of NPM overall. I highly recommend every SolarWinds engineer to challenge themselves to learn more SQL. I am by no means a DBA, but I can pull every bit of data you can get from the website, and I can do it faster 90% of the time.

NPM is an incredibly flexible and extensible product, especially in recent revisions, and offers a lot of opportunity for people willing to really dig in behind the scenes. As usual, I have more questions:

* What SQL version and architecture are you using (separate database, named instances, etc.)?

* What architecture have you found helps in the speed department?

As an example of what I'm interested in: we run Cisco UCS servers, with VMware as the hypervisor layer, backed by NetApp FAS3240 fully licensed arrays, with Flash Cache, etc. We tier our storage manually and have full production SQL and Oracle instances virtualized.  The storage is connected to the UCS with an aggregated 80GB, and the UCS to the core at 160GB.

Parents
  • In my experience, NPM performance issues are due to a misconfigured SQL server. It's similar to a VDI project: things work well up to a certain point, at which you hit the IOPS limit on your LUN and all of your desktops grind to a halt. NPM is the same, where an initial deployment with little to no stored data works well, even if your SQL server isn't configured to best practices (CPU / memory allocations, storing logs and databases on separate volumes, creating a sane maintenance plan, et cetera). But once you collect a few months worth of data, and maybe turn on syslog, and take the plunge to capture flows, SQL can't keep up. Build a solid SQL server, wrap it in proper maintenance, and NPM should be happy.

    I've also used another server for the web GUI to reduce load on the NPM server, but I did this in concert with migrating the NPM database to a proper MSSQL cluster, so I'm not certain how much performance benefit can be attributed to the web console being broken out. But I ended up with a traditional three-tiered web app architecture, and performance was no longer a problem.

    All were vSphere VMs, EMC backend with separate LUNs for each volume on the SQL server, 2003 R2 for all nodes (I admit, this was a while ago. emoticons_happy.png). Maintenance plan is key, and tuning your NPM data retention settings is a big part of it, too. The old the data is, the more it should be summarized. YMMV.

  • Good tips for sure.  We've thought about moving the web front-end for a while now... just haven't gotten to it yet.  emoticons_happy.png

  • One other thing that I found was to make sure your roll up is happening.  I had something break once in the nightly roll-up, and it was a month or two till I noticed it.  So it was an extra month or two of full detail which really slowed things down for me.  So watch c:\ProgramData\Solarwinds\Logs\Orion\swdebugMaintenance.log for "] Error "

    - The biggest downside of SSD is GB/$. SSD seams *really* expensive when compared to 15k disks, but when you look at it as IOPS/$, SSD usually beats 15k, especial if you add IOPS/kwH and/or IOPS/BTU.

Reply
  • One other thing that I found was to make sure your roll up is happening.  I had something break once in the nightly roll-up, and it was a month or two till I noticed it.  So it was an extra month or two of full detail which really slowed things down for me.  So watch c:\ProgramData\Solarwinds\Logs\Orion\swdebugMaintenance.log for "] Error "

    - The biggest downside of SSD is GB/$. SSD seams *really* expensive when compared to 15k disks, but when you look at it as IOPS/$, SSD usually beats 15k, especial if you add IOPS/kwH and/or IOPS/BTU.

Children
No Data