29 Replies Latest reply on Apr 19, 2018 7:57 AM by douglasmauro

    Element Load Balancing

    nick_scott

      Like many SolarWinds customers we have a large environment with over 10,000 nodes.  Its constantly changing and routinely needs elements/nodes reallocated to the polling engine with the least load.   My question is...  Can someone help me, or point me into the correct direction to develop a report that querys all nodes providing the number of polled IP's, interfaces, volumes, and SAM components ?  

       

      If it makes it easier, I would be fine with leaving the SAM part out, for the most part im concerned with elements (NPM)

       

      Thanks community!!

        • Re: Element Load Balancing
          Steven Klassen

          Hey there, give this a try. You can also drop this into the web-report writer as a SWQL report:

           

          SELECT
              n.Engine.ServerName AS PollerName
              ,n.Caption
              ,ISNULL(i.Qty, 0) AS InterfaceCount
              ,ISNULL(v.Qty, 0) AS VolumeCount
              ,ISNULL(i.Qty, 0)+ISNULL(v.Qty, 0)+1 AS TotalElements
          FROM Orion.Nodes n
          LEFT JOIN
          (
              SELECT NodeID, COUNT(1) AS Qty FROM Orion.NPM.Interfaces GROUP BY NodeID
          ) i ON n.NodeID = i.NodeID
          LEFT JOIN
          (
              SELECT NodeID, COUNT(1) AS Qty FROM Orion.Volumes GROUP BY NodeID
          ) v ON n.NodeID = v.NodeID
          ORDER BY PollerName, TotalElements DESC
          
          4 of 4 people found this helpful
            • Re: Element Load Balancing
              patriot

              Nice report Steven and great timing as I was thinking about how to construct the same report. Is there a way to add the ability to show a total element count for each polling engine into this same query?

              • Re: Element Load Balancing
                wluther

                mrxinu It would be great if SolarWinds would just assign some sort of weighted value to each node, and made that value accessible to us, so we could see a real time cost of service per node value. It would make moving nodes around much easier.

                • Re: Element Load Balancing
                  RichardLetts

                  At the risk of pointing out that the numbers of elements are not linearly correlated correlated with the Polling rate.

                   

                  Consider four of our polling engines below (All four servers are identical, aside from the nodes being polled.)

                  The two pollers running at 50% or less have more than 10,000 elements on them

                   

                  P1P3P5P7
                  ELEMENTS880597791254610371
                  POLLING RATE77%87%50%45%
                  HARDWARE HEALTH POLLING RATE41%48%7%25%
                  ROUTING POLLING RATE1%0%0%0%
                  UNDP POLLING RATE0%1%0%0%
                  WIRELESS POLLING RATE0%13%16%1%
                  INTERFACE ELEMENTS71197523118328298
                  NETWORK NODE ELEMENTS164522537042063
                  NUMBER OF HW HEALTH MONITORS124814674951550
                  NUMBER OF HW HEALTH SENSORS1101310696398613916
                  POLLING COMPLETION99.9910099.9999.99
                  TOTAL JOB WEIGHT3394428839864177
                  VOLUME ELEMENTS4131010

                   

                  This suggests that additional polling load has a disproportionate cost.

                   

                  note: P7 has more of everything than P1 (other than volume elements and routing polling, and they are not /that/ different ), yet has a considerably lower polling rate (difference = 32%)

                   

                   

                    • Re: Element Load Balancing
                      wluther

                      RichardLetts I think the magic is in having the weight/score of each node visibly attached as a metric for that node, however it is SolarWinds wants to do that, so we can query that score, and balance the pollers better.

                       

                      Like you said, you have 4 identical polling servers, yet two of them have thousands more elements, while they are only being 50% utilized, in comparison. A node's weight should stay the same across all pollers, with only the polling rate percentage varying, correct?

                       

                      Are you saying the weight of each node would differ, based on which polling engine it was placed on? Or, are you saying that the weight of each node would be a set number (based on polling frequency, interfaces, vols, apps, etc.), with the weight remaining the same across any polling server, with only each server's polling rate percentage changing?

                      If a node has a total polling weight of 100 (as an arbitrary example, calculated by polling frequency, interfaces, vols, apps, etc.), are you saying the weight for that node would be 100 on P1, but 150 on P2, or 90 on P3, etc.? Or are you saying the weight of the node would be calculated as 100, regardless of which polling engine it was placed on, only producing a varied polling rate percentage per polling server? (Maybe 100 = .3% on P1, but 100 = 1.2% on P5, or 100 = .91% on P7)

                       

                      How are you managing your pollers/devices polled? (Other than repetition and experience) Do you manually balance your pollers, and then apply custom properties to the nodes on each poller to manage them? I am always looking for better ways to manage mine, as I'm sure we all are, but it looks like you are running with at least 2x more than me.

                       

                       

                      Thank you,

                       

                      -Will

                        • Re: Element Load Balancing
                          RichardLetts

                          What I'm trying to say is that i cannot find a correlation between the poller load and the counts or wights of anything on it.

                          I suspect that the load and counts relationship is Non-Linear; i.e. below a certain threshold load kind of increases linearly, but when you cross some threshold (the knee) it blows up.

                           

                          apples_and_lemons (xkcd)

                          (from xkcd sometime I think)

                           

                          This makes trying to do automatic balancing difficult (so I gave up trying)

                           

                          We manage our load by adding new devices to the App server and periodically (weekly) manually moving them on to the least loaded polling server

                          if something has just been added it's then easy to tell when/if wacky alerts fire or if someone's not added it to our CMDB properly...

                        • Re: Element Load Balancing
                          douglasmauro

                          Did you get your table via report or have to create by hand?  If report, care to share?   TIA.

                      • Re: Element Load Balancing
                        cahunt

                        I have a report that show Job Weight for each Orion Server and the Weight Scale Metric for each Application Component. Element counts are also included.

                        Though it would be great to see this natively.

                         

                        Orion Server Polling Load Details

                        • Re: Element Load Balancing
                          sheepy28

                          thanks for this, had a similar issue