This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Mommy, Where Does Host Performance Data Come From?

Data gaps.  They are the bane of every monitoring engineer's life.  No data means no alerts (unless you are alerting on no data, which is another topic entirely) and no alerts means that when something goes wrong that customers know before the engineers who can fit the problem know and that is bad for business.  It is also bad for trust and trust is all that we have.  (https://www.linkedin.com/pulse/20140901182532-13406765-honesty-is-the-best-policy )

With that preamble, let me set the stage for the problem I having.

All of our VMware hosts are SNMP enabled.  That is a relatively recent thing for us so not all of them are actually collecting data via SNMP.  Fortunately we have VMAN which collects data using the vCenter API.  Another engineer reached out to me indicating that a VMware host had gaps (actually, *has* gaps) in some of the metrics charts.  His specific concern was memory data.

I checked the VIM_HostStatistics_Detail table and found no gaps in the data.  (We poll that particular in VMAN every 15 minutes because the data set is *huge*)

I checked the CPULoad_Detail table for the same host and found gaps in the data of up to 1 hour.  I would have expected to see data at 15 minute intervals assuming that VMAN is populating that data instead of SNMP.

Yes, the node is SNMP-enabled.  Oddly enough, CPU & memory data are NOT being collected via SNMP on this node (which I will fix in a minute) which leads me to believe even more strongly that the data from the VIM_HostStatistics_Detail table is being parsed into the CPULoad_Detail (and others) table for display in NPM.

My questions are :

1)  Is my assumption about the flow of data from VMAN to NPM correct?

2)  If a host is polled via SNMP for CPU and memory data does the SNMP or VMAN data take precedence assuming that my assumption in question 1 is correct?

3)  What process(es) are involved in the ingestion of that data and why, if the data is present in the VIM_HostStatistics_Detail table would it not be reflected in the CPULoad_Detail table?

Screenshots:

Here is the view of the data from the CPULoad_Detail table:

pastedImage_1.png

And here is the same time interval from the VIM_HostStatistics_Detail table

pastedImage_2.png

  • i am trying to build mhz use charts by stacking vm level use. someone else using SAR data is indicating the vman6.5 based charts are not showing the correct use. i am seeing missing data as well. could it be that vman is not using vcenter to poll the data? and if it is snmp issue, why?

    it is kind of hard to drive a fast car with an indication of remaining fuel, speed, etc

  • If node is polled via SNMP, CPU and Memory utilization is polled via SNMP.

    If node is polled via SNMP and moreover it's ESX host, both protocols are used (SNMP + VMware API). As a result of this, CPULoad view contains statistics from both SNMP and VMware API). VIM_HostStatistics view contains only data from VMware API.

    If node is polled via ICMP and it's ESX host, CPU and Memory statistics are polled via VMware API. They are stored in CPULoad and VIM_HostStatistics as well.

    I hope I answered your question.

  • Is there a best practice for ESX hosts?  Do we need to poll them via both SNMP and the VMware API?

    The issue we've found is that we poll less infrequently in VMAN due to the size of our environment (15 minute intervals) and at 5 minute intervals in NPM.  Is it safe to assume that displayed in the NPM widgets is from SNMP and the data polled from VMware-centric (IVIM) widgets is from the API?