9 Replies Latest reply on Feb 13, 2013 10:23 PM by casey.schmit

    NPM 10.2 polling capacity

    Ciag

      Hi Folks,

      I've been doing a bit of reading about the new, long awaited, fully fledged multi-threaded polling engine in NPM 10.2. I'm yet to upgrade but am considering making the change soon and after reading this doc I have a couple of questions.

      It is mentioned that the polling performance is now mostly based on the hardware capability of the polling server. Does the '8000 elements at a 10 minute interval' approximate limitation still apply, or does this now mean that with an SLX license the polling intervals can be decreased until the CPU and memory are close to being max'd out?  

      If I were to continue throwing resources at the polling server would that increase the potential of the server to manage a higher polling 'weight'? I know it mentions the maximum polling load for NPM jobs as 2600, is this a true value or is it just an example. If it is accurate, how does 2600 max weight translate into real world polling ability; can you give an example of what weight a cisco 4500 switch monitoring CPU, Mem and 20 ports would have?

      How does 1 unit of weight compare with the old terminology of 'elements' is there any direct correlation?

      Regards

      Ciaran 

        • Re: NPM 10.2 polling capacity
          mavturner

          Ciaran, these are all good questions.

          For now we are taking a phased approach. You will get around 10,000 elements per poller instead of only 8,000. Before, you would run into issues where the hardware wasn't keeping up and you could through more resources at it to make a difference. In 10.2 this is no longer true. You will get around 10,000 elements on a polling engine at standard polling intervals. 

          From a technical perspective, each of our jobs carries a certain weight. For example, we have a weight for status, statistic collection, and the topology job. When that weight is exceeded the other queued  jobs will be delayed. This means you won't "miss" a poll, but it could be delayed in heavily utilized environments. 

          Mav

            • Re: NPM 10.2 polling capacity
              Ciag

              Hi Mav,

              thanks for your reply. That's great news about the additional polling 'elements that the 10.2 engine can handle, it will certainly make a bick difference to my environment.

              I pressume the limitation of the 10,000 is an imposed one and not an application limitation, which I sort of get....

              With regards to the 3 weighted jobs you mentioned; are these the only jobs that are included in the weighing methdology, or are there more? Aslo can you specifiy which job would be given a higher weight than others? Knowing this information is very important when capacity planning, provisioning resources and most importantly poller load balancing. All these tasks I will be undertaking post 10.2 upgrade, having this information would help in my reallocation of resources.

              Regards    

                • Re: NPM 10.2 polling capacity
                  casey.schmit

                  For NPM, we weight the status, statistics, and rediscovery jobs for nodes, interfaces, and volumes. Other modules may weight their jobs, but those only factor into the jobs for that module.  They wouldn't affect anything related to the NPM jobs or NPM weighting.

                • Re: NPM 10.2 polling capacity
                  waxxworx

                  Hi, this is a good thread and I'm surprised there isn't more conversation on Thwack about the new poller. We are so far unhappy with the throttling as it appears to be arbitrary when evaluated on our large, underutilized, server where multi-threading could potentially do great things.

                  My specific question right now (apart from complaining about the overall implentation) is when/how is the job weight calculated? My main poller has a calculated weight of 2994 and therefore, being over 2600, is getting throttled. I just moved about 2000 elements to another poller. Neither job weight has changed, though the updated element counts did. I saw something similar when we first went to 10.2, this main poller had about the same amount of work and a very low calculated job weight. 1 monday after scheduled reboot I came back and the job weight suddenly pushed it above 100% with no change in the workload.

                  How can we troubleshoot this job weighting? I just don't think it's working correctly. Most of our elements are volumes and they've all be set to 1 hour poll and yet the way the job weightings compare between pollers it seems like they may be counted differently.

                    • Re: NPM 10.2 polling capacity
                      Ciag

                      Hi Waxxworx,

                      that is exactly my concern about the new polling service. More accurate reports to show poller health but no real concrete info on how to plan or troubleshoot your polling environment.

                      I had assumed that the quietness about this thread was a sign that everyone else was getting on successfully with the 10.2 set-up; maybe that is still the case and your envirnment is an exception. Have you raised a support ticket to try and get to the bottom of this?

                      Regards

                      Ciaran

                      • Re: NPM 10.2 polling capacity
                        travis_vanholland

                        How do you know when your Orion Polling engine is being throttled? Is it based off the polling rate maximum? Is it based off the Total Job Weight?

                         

                        I'm sitting here with a polling completion of 99.99% wondering why I need to buy an additional poller when I have such beefy hardware to back up the software. If there are stability issues in the software that once a certain number of elements is reached jobs won't complete, fine. But this seems like a push from Solarwinds for us to buy additional pollers.

                        • Re: NPM 10.2 polling capacity
                          casey.schmit

                          Waxxworx,

                           

                          I'd open a support case for us to take a look at the details of your setup.  There are several details(element counts, polling intervals) that factor into it.  It's been a while since I worked on the weighting code, but it's updated fairly quickly anytime a job is added(new nodes, new interfaces) or a job is updated(polling intervals are modified).  It takes a few minutes for this to bubble up to the polling engine details pages.