    Nodes continue to show as down when they are up


      We lost connection on our firewalls on two sites today.

      Everything came back up, but in SolarWinds, several nodes remain down, but not all.

      It had been over an hour and the polling interval is 2 minutes.  I tried manually polling them, too, and they still show as down.

      I can ping them, connect/login to them (servers and routers and wireless access points are down), so I know they're up and I can verify the SNMP on the devices, yet they still show as down.

      I went to the app server and stopped all the SolarWinds/Orion services, verified that I couldn't get to the web console, waited for 5 minutes and restarted them.  They still show as down.

      Any idea why these devices are polling as down?  They are set up for both SNMPv2 and ICMP but if I run a test, it fails.

      I rebooted the server.  Same problem.

      Why is it that I can login to these servers and people are able to get files and authenticate and pull GP on domain controllers, but they still show as down int NPM?

      I also see a strange notification that I'm in evaluation version, but this install has been a licensed install from the start on 2 new servers?



      Edit:  an hour after posting this, one of domain controllers showed up as being up, but the rest are still down.

      On a hunch, I pinged one of the other servers from the app server and sure enough, the ping didn't come back.  It resolved to the correct IP address, but did not ping.

      The DNS servers are up and running and pingable.

      I can ping it from my machine, which is on the same switch and routers as the server, and I can even remote desktop into it from my machine, but the SolarWinds app server can't see it.

      Why would 2 firewalls dropping off for a few minutes and coming back up cause anything like this?

          Did these firewalls have changes that weren't committed to the startup config?

          Solarwinds determines up/down based solely on pings except when you have specified to use another method on the particular node, so you pretty clearly have something in the pipe blocking ICMP between your polling server and the nodes.  Traceroute from the server and see where it stops.  Check the inbound and outbound firewall/acl rules to make sure that there isn't a directional issue.


          If you have a fancy firewall some of them can simulate packet flows to verify that the specified traffic can make it from x.x.x.x to y.y.y.y across specified interfaces.


          Whenever I am troubleshooting ANYTHING in Orion I do it from the polling server itself because I can't count the number of times that I have had firewall/acl/antivirus situations that cut communication to the polling server and didn't impact us on whatever workstation network we happened to be sitting on.


              Nothing changed on the firewall.

              The only thing that seems to be affected is the SolarWinds app server.  ICMP works for other devices on that server that follow the same network path, so I'm not sure why it's not working on these few.

              Tracert goes one hop and then times out the rest of the way, so it doesn't look like it's getting to the firewall when I tracert on one of those nodes.  When I do the others it hits all the correct hops over the bridge to the server, so if it was an issue with ICMP or SNMP on the SolarWinds server, it wouldn't be picking up the other nodes at that site.


              I'm not sure how that got marked as the right answer, because, although helpful, only reinforced all the troubleshooting I already did and didn't give me any new options.

              I appreciate it, but it shouldn't be marked as a correct answer for this issue.

              Is your firewall a Checkpoint firewall by any chance?

                what mesverrum suggested


                I would also do a courtesy stop-all and restart all SW services on your polling engine


                can you confirm that the services on the nodes that are back up are working?


                seems firewall-ish from the symptoms

                    I already stopped all the services and restarted them.  It's in my original post along with a full reboot.

                    All the services on all the machines are working.

                    We have a file server, a domain controller, and DNS/DHCP and all are running properly.  Users can pull GP, they are pulling IPs and DNS is resolving.  As a matter of fact, DNS resolves on the SolarWinds server when I ping some of these devices, but they just time out.  I've tried with the name and the IP.  Same thing.


                    If it's a firewall issue, we can't find it.  Nothing should have changed.  We have had these dip several times before and had no issues.

                    Routing tables look fine on the SolarWinds server as well, and it is getting information from other nodes on that site.

                    If you go to the website console and Poll in the settings on the solarwinds server and the server the websites on you should be able to see the website then. Let me know if it works.

                      I had that issue with a few nodes a while back...what I ended up having to do was make a database correction.  The field that indicates if a system is up / down / unmanaged wasn't being updated correctly.  I worked through this issue with support.  It happened a while ago and don't remember what field we corrected...once the node was corrected in the database, it started responding again to normal up / downs and such.

                          Ok, so we also found that Mobile Admin works as far as I can connect to the server remotely and see what is down, but I'm not getting push notifications for Mobile Admin.  Nothing was changed with SolarWinds or Mobile Admin, so I have to assume that something in the firewall is causing the issue, possibly blocking the push notifications, although we have yet to find it.


                          To be clear, I DO get the e-mail notifications I set up in SolarWinds Orion, but not the [push alerts for Mobile Admin.


                          Further, we had a virtual host and all the servers associated with it go down early this morning (still investigating the cause), but I didn't get down notifications for them in Mobile Admin or through e-mail.  I did get the e-mail alert that 3 of the 4 virtual servers came back up and that they were down for 4-6 minutes, but no e-mail notification that they went down and still no up notification on the host and one of the servers, even though both are up and we're looking at the logs.  We also had to reboot one of those virtual servers manually to get an application to work and that didn't show up at all.  Granted, the time it was down may not have been very long, but I should have gotten a 'reboot' notification, and I never did.  I can go into Mobile Admin and see that the server was rebooted at 7:05am, but there was no reboot alert e-mail and no push alert from Mobile Admin.   It looks like Mobile Admin is getting the right information, just not alerting, and I'm not at all sure why the e-mail alerts, which are separate, aren't working correctly either.  They did pick up some of them and some of the nodes that went up and down, but not all.


                          Something must have changed on the firewall side when they put the old ones back in, but the old ones were never shut down, just bypassed until we determined there was a major issue, and then put back online.


                          None of it should have affected the Mobile Admin Alerts to stop, though.

                              OK, so I went through a ticket with support because I was getting a message that one of the evaluation licenses was up.  They had me remove all the licenses, then I had to have them deactivated so I could reactivate them.  As soon as I did that, all those nodes came back up, HOWEVER, I'm still not getting alerts on Mobile Admin.

                              I was getting them on the desktop notification tool prior to the license reset, but I'm not getting them there now, either.

                              I checked configuration and restarted the services.  Restarted the phone, too, and so far, they don't match up.  The Mobile Admin still shows those nodes down.

                            It appears this fixed itself.

                            We tried one of the firewalls again today and again, the same nodes went down, but after a couple hours, they started coming back online.

                            I still am not entirely sure what fixed it.


                            Mobile Admin alerts appear to be coming in, but a couple hours late, which doesn't help, but that's a different post.


                            Thanks to all that helped!



                            We found out a few days ago that there is a bug in the SonicWall that's affecting bridged network traffic.

                            This is likely the issue.  We have a support ticket in with them to get it fixed.