This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Running Solarwinds on VMWare - Best Practices?

I've noticed at two different companies now that when changed over from SNMP to WMI that everything runs slower and we have to constantly reboot the primary poller and/or additional pollers to reclaim the resources as CPU and memory get consumed and not released.  Does anyone have some best practice tips for setting up the virtual guest to run the heavy load that solarwinds puts on the vm?

  • What process/s are consuming the majority of resources on the Poller? Also, what operating system are your pollers running on? Which Orion product/s version/s are you running on the poller/s? What you're describing should not be occurring, but is rather a symptom of a problem rather than being the problem itself. We would likely need to look through your diagnostics and possible even a memory dump from your system to determine the cause. I would recommend opening a case with support so we can get to the bottom of the issue as constantly rebooting your Orion pollers is not an acceptable solution for any environment.

  • We are running one 2012 on the primary poller with these modules: Orion Platform 2014.2.1, SAM 6.1.1, QoE 1.0, IPAM 4.2, NCM 7.3.2, NPM 11.0.1, NTA 4.0.3, Toolset 11.0.0, IVIM 1.10.0, VNQM 4.2

    I got support on the line today and they did all of these things:

    1. Additional 2GB of RAM was added to the Orion Server.
    2. Enabled Defragmentation of the Orion Database.
    3. Truncated Traps and TrapVarbinds table.
    4. Removed JE, JE2 and collector services.
    5. Cleared pendingnotifications/subscriptags/subscriptions

    I'm still curious if anyone tweaks their virtual guest settings, such as:

    • DRS rule to keep primary poller and database server together and/or seperated
    • Setting memory or CPU shares to HIGH
    • Setting reservations for CPU
    • Reserving all the vm guest memory
    • Keeping resource allocation to unlimited or capping it out
    • Running E1000 or vmxnet3 drivers
    • Running one or all of the disk via VMware Paravirtual instead of LSI Logic SAS
    • Keeping default No affinity or changing
    • Keeping Hyperthreaded Core sharing to 'Any' or changing the mode to 'None' or 'internal'
    • Any special options turned on/off
    • Keep CPU/MMU Virtualization to automatic or something else?
    • etc, etc, etc....
  • I would recommend upgrading to IPAM 4.3, NTA 4.1, NPM 11.5, and SAM 6.2. Between those releases there are literally hundreds of bugs fixed.

  • I plan on upgrading in a few weeks once support gets caught up with all the calls coming in for the new upgraders.  Funny thing is I have started monitoring the RPCSS process and it shows a big jump in CPU consumption around the same time the environment goes to crap and all the SAM and Hardware Health statuses change to UNKNOWN.  I plan on changing to the agent based version to avoid this problem going forward.

  • I think I got off the path of what options other people are using when setting up their virtual guests.  Just curious what solarwinds support suggest?

    • More virtual sockets or more cores per socket
    • What SCSI controller - Paravirtualization or something else
    • VMXNET 3 NIC, one NIC or team?
    • Thin or think provisioned disk?
    • Any advanced option changes
    • Special swapfile location
    • High shares or normal
    • Reservations
    • Affinity rules
  • Orion performance is primarily driven off the database server. A CPU constrained poller is the most likely bottleneck you'll encounter with your Orion server, be it virtual or physical. It's recommended that you have at least one CPU core per-product installed. Disk I/O is normally not the constraint on the Orion server, that's the most typical SQL server performance bottleneck.

  • What about setting up DRS affinity rules to either keep vm's together and/or separate?

    • Primary poller and Database server; Separate/Together?
    • Primary poller and additional webserver; Separate/Together?
    • Additional webserver and database server; Separate/Together?
    • Primary poller and additional pollers; Separate/Together?
    • Primary poller and NTA server; Separate/Together?
  • The major consideration with affinity would be resource contention. Certainly nexus is preferable- but can your host handle a primary poller and DB server on the same box? IO on datastores will also be a major concern. VMWare best practices for SQL would be advisable read.

  • We have made some impressive ground by changing the SCSI controller to paravirtual (for all disks) and the CPU has come down by about 20%.  But we also have to reboot every morning because of the RPC issue but if the systems stays this stable life will be good. Of course I did add 4 more cores recently so my results may be off emoticons_happy.png

    ...let me drop back down to 4 cores and report my findings.

    The current DRS rules I have set are to keep the DB, Primary Poller, and Additional Poller all together.  Then we have another rule to keep the NTA and Primary Poller separate from each other.

  • Do you recall what the process was for clearing PendingNotifications?

    Thanks,

    Joe