missing ESXi hosts in SAM / VMAN (basic)

SAM / VMAN does "basic" polling of the majority (~40) of our ESXi hosts connected to vCenter but is missing 3. The SolarWinds server can ping those, resolve their hostnames. It does not show them in "Virtualization Summary". The hosts are in vCenter (7.03.01500), with configuration not any different from all the other hosts (a mixture of 6.7, 7.0 and 7.03 ones).

The "Add a vCenter, Hyper-V host, standalone ESX host, or Nutanix cluster for monitoring in VMAN" article says:

As soon as you've finished adding the vCenter, Hyper-V host, standalone ESX host, or Nutanix cluster, VMAN begins monitoring it.

... yet I am not having such luck. Any idea what I might be doing wrong?

Running SELECT TOP 1000 * FROM [dbo].[VIM_Hosts] in Database Manager doesn't show the missing hosts. They are not anywhere in SolarWinds otherwise (e.g. as ICMP nodes).

Thanks!

(NPM, SAM 2023.1.0 with basic VMAN, polling ESXi hosts via vCenter. No direct ESXi polling.)

  • , if you use the service account you're using for VMAN and log into vSphere, can you see those 3 hosts there? I want to make sure it isn't a permissions restriction on the service account to not see those 3 missing hosts. 

    Other thoughts: Do those 3 hosts have overlapping IP addresses with existing nodes in SolarWinds?

    Are you able to manually add those 3 ESX hosts as ICMP and add VMware poling?

  • Yes, verified the service account: full permissions, can see the hosts that SolarWinds doesn't pick up.

    The IPs doesn't overlap with other nodes in SolarWinds. (We have under 100 nodes so far, so it's easy to verify - yet I'll verify by doing a search in Database Manager.)

    Yes I can add the hosts as ICMP ones. Enabling VMware polling on them asks for ESXi credentials:

    Entering those credentials (they're stored and confirmed good) and running "test" results in near-immediate (under 2 seconds) "Warning Test Failed. Cannot login with selected vCenter or ESX credential."

    Do not see the likes of "Cannot login root@<IP address>" events in the logs. (Failing a direct login with a bad password on purpose - does produce that kind of an event. I.e. the SolarWinds application can't - or is not set up to - reach the login page of the target ESXi host.)

    I can navigate to the same host (IP or hostname) in a browser on the SolarWinds server and login to it using the same credentials.

  • A miracle.

    (I ought to sacrifice something tonight at the altar of solar prominences.)

    Added one of the missing hosts as an ICMP node in SolarWinds with no other changes. (Did not try enabling VMware polling.)

    34 minutes later it became part of "virtualization summary" and became a polled ESXi host.

    Adding two other missing hosts the same way, and saying a prayer.

    (Thanks for idea, !! Pray)

  • The last two missing hosts remain purely ICMP 2 hours later, i.e. SolarWinds still doesn't see them as ESXi hosts. Thinking

  • Those last two - still can't add them to SolarWinds. Will try submitting a support request. (Is VMAN Basic supported as part of Orion?)

  • SolarWinds support came through with a list of instructions below, yet it all boiled down to enabling an esoteric "MatchVMWareHostsOnBiosID" checkbox in "Advanced Configuration" in SolarWinds, and the missing hosts showed up. One related KB article actually shows up in web search for "MatchVMWareHostsOnBiosID":

    Full text of SolarWinds Support's instructions (edited substantially; screenshots added):

    First Perform isolation in the vCenter Manage Object Browser (MOB) as below,

    • Access (vCenter IP address or hostname)/mob via browser, using the same credentials that the vCenter was added in Orion. (Doing so directly from the SolarWinds server instance may be a good idea.)
    • Navigate to 'content' > 'rootFolder' (most likely named "group-d1")
    • In 'childEntity' block (expand by clicking "more" if needed), find the datacenter that is missing in SolarWinds VMAN, click on it


    • choose (click on) the value in 'hostFolder' row:



    • Under 'childEntity' look for the ESXi hosts that are missing in SolarWinds. If present, proceed to the next step. If not, it's a VMware issue, not a SolarWinds one.

    If the ESXi hosts are present in <vCenter>/mob, proceed with the following:

    • Navigate to the Advanced Configuration page console in SolarWinds: https://<orionservername>/orion/admin/advancedconfiguration/global.aspx
    • Locate "MatchVMWareHostsOnBiosID", tick the checkbox, save

    Navigate to SolarWinds web UI and (consider not doing any of the steps below as they may be (1) unnecessary and (2) result in data loss):

    1. Removed the vCenter and all related nodes
      (did not do that - was unnecessary, and in fact find this quite unexpected, disruptive and unprofessional that SolarWinds support would ask to basically nuke historical performance and other data when it may not be necessary)
    2. Navigate to Settings -> Manage nodes and under VMware tab search for all the related nodes - make sure to remove them all, according to the diagnostics (IP Addresses) also the hosts that are not visible under the vCenter are added as standalone nodes - make sure to remove them all
      (ditto, did not do that, was unnecessary)
    3. Run DB maintenance to remove all orphaned data/entities (did not do that, was unnecessary)
    4. Add the vCenter again - make sure to use the correct credentials - you do not need to specify credentials for ESX hosts, add only the vCenter, hosts should be rediscovered and readded.
      (did not do that, was unnecessary)
    5. Check the newly added vCenter after a few minutes and verify if there are still missing hosts or if the missing hosts are added as a standalone host/node

    What I did instead (of deleting and re-adding vCenter after checking the "MatchVMWareHostsOnBiosID" box) was restart SolarWinds services ("shutdown everything", "start everything" via "SolarWinds Platform Services Manager", wait a bit (under 5 minutes), and the missing hosts showed up.

    There are still some inconsistencies where a newly added ESXi host shows a hardware health issue (power supply failure) that's actually not there - I'll keep at it with SolarWinds support.