This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Nodes continue to show as down when they are up

We lost connection on our firewalls on two sites today.

Everything came back up, but in SolarWinds, several nodes remain down, but not all.

It had been over an hour and the polling interval is 2 minutes.  I tried manually polling them, too, and they still show as down.

I can ping them, connect/login to them (servers and routers and wireless access points are down), so I know they're up and I can verify the SNMP on the devices, yet they still show as down.

I went to the app server and stopped all the SolarWinds/Orion services, verified that I couldn't get to the web console, waited for 5 minutes and restarted them.  They still show as down.

Any idea why these devices are polling as down?  They are set up for both SNMPv2 and ICMP but if I run a test, it fails.

I rebooted the server.  Same problem.

Why is it that I can login to these servers and people are able to get files and authenticate and pull GP on domain controllers, but they still show as down int NPM?

I also see a strange notification that I'm in evaluation version, but this install has been a licensed install from the start on 2 new servers?

Ideas?

Edit:  an hour after posting this, one of domain controllers showed up as being up, but the rest are still down.

On a hunch, I pinged one of the other servers from the app server and sure enough, the ping didn't come back.  It resolved to the correct IP address, but did not ping.

The DNS servers are up and running and pingable.

I can ping it from my machine, which is on the same switch and routers as the server, and I can even remote desktop into it from my machine, but the SolarWinds app server can't see it.

Why would 2 firewalls dropping off for a few minutes and coming back up cause anything like this?

Parents
  • what mesverrum‌ suggested

    I would also do a courtesy stop-all and restart all SW services on your polling engine

    can you confirm that the services on the nodes that are back up are working?

    seems firewall-ish from the symptoms

  • I already stopped all the services and restarted them.  It's in my original post along with a full reboot.

    All the services on all the machines are working.

    We have a file server, a domain controller, and DNS/DHCP and all are running properly.  Users can pull GP, they are pulling IPs and DNS is resolving.  As a matter of fact, DNS resolves on the SolarWinds server when I ping some of these devices, but they just time out.  I've tried with the name and the IP.  Same thing.

    If it's a firewall issue, we can't find it.  Nothing should have changed.  We have had these dip several times before and had no issues.

    Routing tables look fine on the SolarWinds server as well, and it is getting information from other nodes on that site.

Reply
  • I already stopped all the services and restarted them.  It's in my original post along with a full reboot.

    All the services on all the machines are working.

    We have a file server, a domain controller, and DNS/DHCP and all are running properly.  Users can pull GP, they are pulling IPs and DNS is resolving.  As a matter of fact, DNS resolves on the SolarWinds server when I ping some of these devices, but they just time out.  I've tried with the name and the IP.  Same thing.

    If it's a firewall issue, we can't find it.  Nothing should have changed.  We have had these dip several times before and had no issues.

    Routing tables look fine on the SolarWinds server as well, and it is getting information from other nodes on that site.

Children
No Data