Server Nodes in SAM Not Connecting via WMI after Upgrading APEs to Windows Server 2019

Hey all, first time posting here, let me know if you need any other information(or if there's too much, I honestly don't know). Little background - I manage an enterprise SW implementation with around 2k server nodes, mostly Windows that are polled via WMI in SAM. Version is 2020.2.6 HF2. All of our SW environment was on Windows Server 2012 until last week. We have a primary poller and 3 APEs, along with a web server and SQL server. Due to being on the outdated version, we're trying to upgrade and SW support told us to upgrade off 2012 first as that's no longer supported by SW. Last week we updated all of our servers (all on prem) to Windows Server 2019 and after the upgrade we're seeing that on the APEs about half the nodes are showing the NIC either grey or not there anymore in Node Management and they can't connect to the APEs at all, but they can connect to the primary polling engine. The rest are connecting to the APE without issue. I've been troubleshooting the last few days trying to figure out the issue, SW support suggested it's a WMI issue on the servers the APEs are on, which to me, that tracks, but unsure why only half the nodes are experiencing issues. 

Here's a screenshot of some of the nodes with the issue, the impacted nodes all have their NIC showing in an unknown state or are just not there at all anymore and WMI connection test fails in Edit Node and if you delete the node and try to re-add it, but it can be re-added via WMI to the primary polling engine:

Meanwhile other nodes on the same APE are working just fine:

When I log into the server and run wbemtest to the nodes failing, it fails, but it goes through just fine on the one connecting:

Node from above that is connecting through the web console showing 'Test Successful':

Node showing Test Failed:

If I check the WMI service on the example server that is failing above, it's up and working:

All that's changed was the upgrade to Windows Server 2019 and these nodes that are having issues were connecting just fine prior to the upgrade. While they're like this, they show green like they're polling but I have no idea if that's actually the case. Anyone see anything like this before or know where I can look next? One last thing I noticed is that on one of the APEs, the network icon shows "No Internet Access" in the system tray, but the others don't and they're all having the same issue:

Parents
  • At first glance, potentially a firewall issue or maybe a windows firewall issue on the new 2019 servers? Do the new 2019 APE servers have the same IP addresses and hostnames as the old server?

    Are there different GPO's applied within your domain against these new 2019 servers compared to your old?

    Shot in the dark... and may not have anything to do with it but are there any custom TLS configurations (e.g. were legacy/old versions of TLS/SSL disabled explicitly?)

    I'm guessing you've seen this already - https://support.solarwinds.com/SuccessCenter/s/article/Cannot-add-WMI-node-Test-failed?language=en_US

    Do you use kerberos auth? Are forward and reverse dns lookups working from the APE's that are having the issues against the nodes experiencing the problem?

    Best of luck!

  • Thanks for the reply! Same host name, same IPs, we actually did an in-place upgrade so it's all the same. Checked the Windows FW and it's still disabled post upgrade across the APEs. I'll take a look at our DC FW and see if anything is blocking but normally any internal traffic should go through, and no changes with the IP so not sure if that's it, but at this point I have no clue. I'll have to check on the GPO thing but off hand I would assume no difference since they're same FQDN/IP. No idea on TLS/SSL, I believe we have TLS disabled by default but maybe that changed when we upgraded, I'll take a looksie at that. Forward/reverse DNS are good still. That's the doc I was actually referencing when I posted the wbemtest results. No idea about kerberos authentication though, I'll have to ask about that. Thank you again for the response I have a few new areas to check, I'll update once I find out more!

    edit - looks like TLS 1.2 is missing from the registry altogether. Based on what I've read so far I'm about 99% sure that's what's causing these issues. Gonna add it and make sure that fixes it, thank you again!

Reply
  • Thanks for the reply! Same host name, same IPs, we actually did an in-place upgrade so it's all the same. Checked the Windows FW and it's still disabled post upgrade across the APEs. I'll take a look at our DC FW and see if anything is blocking but normally any internal traffic should go through, and no changes with the IP so not sure if that's it, but at this point I have no clue. I'll have to check on the GPO thing but off hand I would assume no difference since they're same FQDN/IP. No idea on TLS/SSL, I believe we have TLS disabled by default but maybe that changed when we upgraded, I'll take a looksie at that. Forward/reverse DNS are good still. That's the doc I was actually referencing when I posted the wbemtest results. No idea about kerberos authentication though, I'll have to ask about that. Thank you again for the response I have a few new areas to check, I'll update once I find out more!

    edit - looks like TLS 1.2 is missing from the registry altogether. Based on what I've read so far I'm about 99% sure that's what's causing these issues. Gonna add it and make sure that fixes it, thank you again!

Children