3 Replies Latest reply on Sep 2, 2015 12:38 PM by Mark Roberts

    Multiple Poller Performance: What Works?

    clarsen

      We're coming up on the need for multiple pollers, and the new pollers will be placed in WAN-separated sites.  Naturally I'd like to keep the WAN chatter to a minimum, so I'm wondering how people before me have prevented this from being an issue, since -- as I understand it -- each remote poller will want to frequently discuss life with the main DB located on our first/main Orion server. Some questions:

       

      Firstly, is this even a real problem?  Do remote pollers chat that much with the DB? Assuming they do, what seems to be the best solution?

       

      The fix I was thinking about was clustering the DB servers and letting each site chat locally, then letting the DBMS replication do the dirty work at some stately scheduled pace.  Of course the problem there is that the data is no longer "live", but at least each local copy is, which also soothes my desire for CYA backups.  Aside from the update delay you also have an alert-awareness delay, but adding remote pollers to the show pretty much guarantees a delay somewhere, so you may as well try to mitigate the delay duration and location.  (As an aside, might there be a way to perform a similar trick with Patch Manager, so that remote hits on that DB are faster too?)

       

      I've read the basic Solarwinds docs, but I wanted to hear about real-life experiences.  What are the issues, options, etc. you've run into, and what does / doesn't work?

      (If this has been beaten to death before, just point me that-a-ways.)

       

      Thanks!

      Curtis

        • Re: Multiple Poller Performance: What Works?
          RichardLetts

          the remote pollers only really push the data into the database that they have obtained using SNMP, so if you're polling remotely now, about that much traffic.

          Database replication won't reduce the WAN load since the same database updates that happen from the poller are the ones that need to be replicated (and indeed it might be higher as data that doesn't need to be at the remote site will also get replicated there)

           

          Honestly, unless you have some security policy in place that prevents your pollers from being in one place I would simply poll from one place using SNMP.

          I might QoS SNMP traffic (e.g. to make it high priority, but limited to 10% of network bandwidth) so it always gets 10% if it needs it, which should be more than enough to deal with monitoring needs.

            • Re: Multiple Poller Performance: What Works?
              clarsen

              Richard - Thanks much for the reply: that gives me some things to check on and ponder, and I'll certainly be doing that.  I may even try setting up a test environment to see the results.

               

              Let me add a couple of items to the question that might help frame the question better.  Aside from performance, I'd also like to be able to have some sort of local installation at the remote WAN endpoints that could potentially be used as DR DB replacements should the WAN link fail, which is another reason I was thinking about DB replication.  (Perhaps it would be better for me to simply set up completely separate DBs at each site, and then use a MOM to observe status, but that seems like overkill right now.)  SolarWinds offers a DR product that can take over locally for a time should the WAN fail, and that may be a solution for me, but is has some limitations.  I was just hoping that folks had done some general home-grown tweaks on that front that they might share.  I've been told that replication can be configured to use less bandwidth, but I'm not a DBA, so I'd have to check on what exactly was meant by that.

               

              I'm also shifting from using just SNMP to using WMI to monitor Windows systems.  There's fair reasons for that outside my control: It offers more information and security for us at the cost of more overhead, and I've heard MS plans to eventually eliminate use of SNMP on Windows systems in favor of WMI.  It's also native rather than adding the SNMP service to each machine, and my Windows folks like it better, so... <shrugs>.  The overhead of WMI vs. SNMP is such though that I've balked at polling a lot of machines over the WAN using WMI, but I'm getting more pressure to do so, as we're adding yet more Windows machines to various remote sites.

               

              So that brings to the head-scratching point I'm at now, where I'm looking for any feasible solutions, starting with the cheaper solutions, because... yeah.

               

              Hope that help.  Thanks again for the reply though: I appreciate any feedback from people who've already tread this path.

                • Re: Multiple Poller Performance: What Works?
                  Mark Roberts

                  In my experience the model of deploying APE's over remote WAN links it is essential to understand the impact and have the infrastructure to support it.

                   

                  It is perfectly feasible to place APE's remotely, with the following considerations:

                   

                  1. Polling a device uses far less bandwidth than turning that polling data into SQL Insert/Update queries.
                  2. APE to SQL communication needs to be good quality (LAN quality) as it will be more sensitive to network quality issues than the SNMP or WMI polling
                  3. Excluding inability to use WMI due to connectivity over a WAN, SNMP, WMI or agent based polling does not change the information collected if remote or centrally polled. Only the ICMP latency and packet loss values are going to be less relevant
                    1. SAM response time monitors will also be affected obviously
                  4. Security implications of polling remotely vs central i.e. Routing, Firewalling, ACL's etc.
                  5. If an APE loses connectivity, it has a finite amount of data it can cache. When it comes back online the cached data is bulk uploaded and can therefore hit a connection harder when other services are likely to want that bandwidth as well to recover. QoS as Richard mentioned will help

                  Hope this helps.

                  1 of 1 people found this helpful