This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

SNMP_vs_WMI_20130412.docx

NOTE: The spreadsheet with the data behind this information appears as a separate upload. See here:

SolarWinds’ (relatively) new technique of monitoring windows servers via WMI instead of SNMP represents a measurable - but manageable – impact on both the target and the polling engine.

On a target server, monitoring with WMI + a SAM template had no effect whatsoever on RAM or CPU (compared to simple ping monitoring) although it did represent an average increase of 12Kbps.

The difference between WMI and SNMP polling was even less noticeable, with a 4Kbps bandwidth bump being the only noticeable effect.

On the polling engine the impact was more pronounced: monitoring 300 servers via WMI with a SAM template included (the most aggressive monitoring combination) resulted in the following increases compared to monitoring with simple “ping”:

  • a 16% increase in average CPU utilization
  • a 4% increase in average RAM usage
  • and a 4Mbps increase in incoming bandwidth.

The difference between monitoring 300 machines with WMI vs. SNMP was even less of an impact on average:

  • 6% CPU
  • 2% RAM
  • 2.5Mbps bandwidth received.

If (as the executive summary states), the difference between WMI and SNMP polling is (statistically) negligible, then why the need for additional hand-wringing? Why not just make the switch and go?

The answer is that the choice of polling method has other impacts beyond the physical toll on the machines involved. Functionally, there are some pros and cons to be weighed:

SNMP polling (as compared to WMI)

  • CON Cannot monitor Windows Volume Mount points
  • CON Challenges with earlier versions of Windows (NT, W2k)
  • CON Requires additional non-standard configuration actions (enabling snmp agent, etc)
  • PRO Fewer ports for enterprise firewall rules
  • PRO No single point of failure for access
  • CON Changing SNMP string requires enterprise-wide changes
  • CON Uses SNMP service start time for uptime metrics
  • Work-around: set up UnDP for hrSystemUptime
  • PRO Extremely efficient use of CPU, RAM and bandwidth (on both target and poller)

WMI polling (as compared to SNMP)

  • CON WMI-only devices cannot use custom pollers (UnDP).
  • Work-around: If the machine has EVER been an SNMP polled device, the snmp info is retained and custom pollers can be used (at least until the SNMP RO string changes)
  • PRO Settings used by SAM automatically
  • CON significantly more firewall ports required
  • Work around: per-server config can nail down WMI to just a couple of ports
  • CON will not work across a NAT-ed WAN connection (VPN, etc)
  • CON one password change in AD can cripple monitoring
  • CON cannot monitor topology
  • PRO uses REAL reboot time for uptime metrics
  • CON less efficient (vis a vis SNMP) use of CPU, RAM and bandwidth on both target and poller
  • This is awesome.  I just got asked to do a similar comparison.  There are added benefits to the volume monitoring via WMI like the DiskIO stuff out of the box.  My main concern that I have to test for is clustered servers.  If I have to monitor a server 5 times (one for each virtual instance because it might move) how will that impact the box.  Right now hitting them 5 times for SNMP is negligible.

  • Just a note - I am no longer able to as of core 2015.1 get custom pollers to work on WMI nodes whether or not SNMP was ever used.

    WMI Monitoring (as compared to SNMP)

    • CON WMI-only devices cannot use custom pollers (UnDP).
      • Work-around: If the machine has EVER been an SNMP polled device, the snmp info is retained and custom pollers can be used (at least until the SNMP RO string changes)
  • adatole‌,

    Just wanted to make a note of something here. In the document there isn't a mention of what the polling interval is set to. I assume it is set for 300 seconds is that correct?

  • Good catch!

    The polling intervals for this test were:

    • ping (plus interface and disk up/down) every 2 minutes
    • standard statistics every 5 minutes
    • interface statistics every 7 minutes
    • disk statistics every 15 minutes