6 Replies Latest reply on Apr 7, 2016 2:16 PM by rschroeder

    With multiple NPM pollers distributed across the country, what alerts do you get when one of them is unavailable?  Any?

    rschroeder

      We're revamping our NPM/NCM environment from several stand-alone NPM regional deployments to one Main NPM server with multiple polling servers.

       

      We just tested what we'd learn if the WAN to a regional poller failed.  None of the regional sites that are monitored by it showed down.

       

      And the poller itself did not display as a lovely and large alarm in the primary NPM node status screen; we could only see an issue was present by looking into the poller's status and seeing it hadn't been sync'd in ten minutes.

       

      Obviously I can monitor each regional poller and send alerts they're unavailable.  I already monitor the WAN routers supporting each region from the master poller in the main region's data centers, so I won't go without data.

       

      But I'm interested in seeing what others have done to ensure they get the right alerts and displays and notification when this happens.

       

      I sort of hoped to see a nearly overwhelming flood of down alerts coming in when a major / regional WAN goes unavailable instead of no news at all.  At least then I'd know about the problem; I could build dependencies and filter out what's chaff and what's critical.

       

      How do you handle this type of setup?  Do you get alerts about the nodes that are only monitored by the unreachable NPM poller?  Or no alerts at all?

       

      Rick S.

        • Re: With multiple NPM pollers distributed across the country, what alerts do you get when one of them is unavailable?  Any?
          cahunt

          If we lose a poller, we get NO Alerts from any node that is monitored by that poller. No stats, no updates.  Whatever the state last reported will stay that way until the poller comes back to collect data again.

            * What I do get; is an alert that my poller went down - Rather If the Server Goes Down - If you want to setup the application monitoring to get alerts when a service hangs or begins to eat resources that needs to be setup also.

               This alert still triggers because we have more than 1 poller, and I cross monitor the pollers.  So Poller 1 monitors Poller 3 and the Main App Engine (we keep the polling load off our main app due to heavy DB input and past web page performance issues ), Poller 2 monitors Poller 1 and Poller 3 monitors poller 2..... this ensures that when a Poller goes off line I see that alert - instead of realizing 25 minutes later that nodes are not updating.

           

          Older versions of the app we had issues where polling collection just quit with NO real visible identifier, other than noticing No Updates.  So if your running into that issue, take a look at the Windows Counters and SW Server App Monitoring Templates. From there you can start trending and building custom alerts based on what you see.

          Cheers!

          • Re: With multiple NPM pollers distributed across the country, what alerts do you get when one of them is unavailable?  Any?
            cahunt

            Another thing is what type of Circuits you are working with..... If you always get a light, then the interface will not go down unless there is a hardware issue, so you need to watch your trends and alert on something other than a down interface (of course keep that in place incase of hardware failure, but consider an EIGRP Neighbor loss alert to know if your wan box is losing the connection to your routers).  And monitor from both ends of course, if possible see if you can monitor an internal and an external IP address for sites.

            Dont overload your device though, so if both IP's hit the same box, make sure 1 is a Ping Only ..<tangent>.. Some Nexus devices are bad at being dully polled - especially if you are using vDC's - keep your pollers light on everything but the main context. (full details there - stats, inventory if you do, history, and only interfaces and critical details on the other contexts).

            • Re: With multiple NPM pollers distributed across the country, what alerts do you get when one of them is unavailable?  Any?
              temark

              I recently opened a support case for this same scenario.  I was told that the current version of NPM has no way to roll polling of nodes over to a another Poller when a Remote Poller goes down.  We recently lost connectivity on primary and secondary WAN links to a location where one of our Remote Pollers is located.  We got not indication from NPM that anything was wrong.

               

              Since then I've set up ICMP only polling that poller from our primary NPM server, as well as ICMP-only polling of a loopback on the core switch in that same datacenter so we'll at least get some alert that a site, and our Remote Poller are unreachable.

                • Re: With multiple NPM pollers distributed across the country, what alerts do you get when one of them is unavailable?  Any?
                  rschroeder

                  One thing I ended up doing was setting up polling of the various NPM servers by remote NPM pollers.  That way I at least know when the regional NPM hardware is unavailable.

                   

                  I also poll the regional routers / gateways from my local NPM solution, which lets me know which regional routers are actually up or down, even though their local poller is off line.

                   

                  I created a Dependency and Group that helps keep me from being overloaded with alerts when a regional site or its poller is down.

                   

                  Finally, there's a slick little option in the Engineers' Toolset that let me manually create a mini NPM/Poller on my laptop, into which I've dumped all the regional router's management addresses.  Even if the regional AND the local NPM poller were to go off line, I can start up the local poller on my laptop and see what's actually down from my local network's point of view.  That feature's been great when doing local changes that have the potential to cause major outages.  I can see the results of a command immediately, rather than waiting for a 2-minute poller cycle to show it up in NPM.

                  2 of 2 people found this helpful
                    • Re: With multiple NPM pollers distributed across the country, what alerts do you get when one of them is unavailable?  Any?
                      Jenya

                      rschroeder, we were in the same boat before. We had our main WAN link between two pollers go down and no alerts. I opened a case and support explained to me that since the nodes that are polled by the additional poller don't get updated in the database by the additional poller, they keep the same status as before (usually "up"). We had to monitor the Additional poller from the main poller and vice versa. We setup both node up/down and application / component monitors. Now we actually get alerts when services on the remote poller get restarted (annoying), but we know when something bad happens. We get an alert saying that far side WAN router is down and SW poller services are down or unreachable.

                       

                      You mentioned dependencies -make sure your additional poller is not dependent on the far side wan router or you won't get alerts on it.

                       

                      One thing I don't have figured out is this... since the main poller is responsible for sending email to the mail server for alerts... when the Main poller goes down and the additional poller is up you get nothing. Maybe that's addressed by having a separate monitoring system monitor SolarWinds or one of those failover engines which we don't have. For now we just keep SolarWinds up on a screen - if it goes down we get a visual indication.

                      1 of 1 people found this helpful