cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Polling Engine Load Balancing

Polling Engine Load Balancing

Hi,

We have paid for an additional Polling engine...great news my main polling engine gets into his running clothes to get ready for all that room to breathe and start going 100 miles an hour.

But wait sorry mate you are going to have to wait for the technicain to work out which nodes need to go where so he can balance the load manually. Wait...what????

Theres my rant why isnt there a recommendations for moving nodes from one polling engine to another.

Maybe a test option to see if the 1000 nodes on one polling engine will work on another?

Currently have main poller and additional, lots of modules but the main ones that are an issue is npm and sam

Main polling engine

pastedImage_2.png

additional polling engine

pastedImage_3.png

I know there both high and I need more but why doesnt the system say move these nodes that do not require sam from the main poller to the additonal poller?

I get that theres money involved and all that but genrally solarwinds is used by multiple different people who do not use Solarwinds as there main job It really should look after itself. Can the cost of adding this feature not be added to buying an additional engine.

Or maybe a report function as cant work out how to look in swql studio for something like nodes polled via wmi from the main poller that do not use sam. Im nearly there to be honest with the report.

Just feels like a missed part that would make solarwinds stand out from the rest.

11 Comments
Level 13

Upvote from me. I have a similar feature request I put out there regarding putting a feature in there to auto balance pollers. Check it out if you want,

upvoted yours too cheers

Level 12

All,

Please be careful here, as this would work nicely if all devices would allow connection from anywhere with any protocol. You have to take into account access list, Firewalls rules, SNMP configurations, routing, etc.. when moving nodes from poller to poller. I have seen Clients generate hundreds of alerts, because they installed a new APE, moved a bunch of routers and switches over to it. The auto move, would need to also run a verification test, prior to shifting the workload from existing to new poller.

Best Regards,

Derik Pfeffer

Loop1 Systems: SolarWinds Training and Professional Services

Hi yeah was speaking with patrick.hubbard​ @ london SWUG17 and he was very keen on a testing option before it commited the move to a new polling engine.

Then got some of the staff to take notes, so hopefully this may get some traction.

Level 11

A couple of us at my company are working on building a tool to do this ourselves, per a suggestion from Solarwinds Technical Support, after I asked if they have a method of doing this. We have 8 polling engines, and doing Windows Patches and other maintenance that requires rebooting each polling engine is a nightmare for our 2nd shift. It takes hours just to shuffle nodes around so that they do not generate false alerts when their polling engine gets rebooted. However, the idea came up when I did a DR test and it took so long to evacuate one "failed" APE. In a disaster, I have better things to be doing than managing my monitoring tool.

Our Balancing Tool outline is this:
Check the database for the list of polling engines and their status (up, down, unmanaged, etc) and the number of nodes and/or elements on each one.Provide a button to manage/unmanage any given polling engine.

Provide a button to balance the load among the polling engines that are "up" so any Down/unmanaged pollers will be evacuated, and the load roughly evenly balanced among the remaining pollers. In short, calculate the maximum desired load for each polling engine (0 for any node not "up" or [total nodes/live pollers+10%] for any poller that is "up") and for any poller that's over the target, move some nodes to any poller that's under the target. Repeat until all are under the target.

When finished with maintenance on each server, just remanage the poller, unmanage the next one, and rebalance. When done, remanage all and rebalance again. In the event of a disaster recovery scenario where one or more polling engines is affected, they're already detected as Down so there's no need to unmanage; just hit "Rebalance" and those get evacuated and the load distributed among everything else. Why has Solarwinds not done this YEARS ago? Should be a standalone app that runs on any polling engine, database server, or even your workstation. All it needs is access to the database server.

ATTN Solarwinds: Please steal my idea and do it better!

Level 11

I should point out that one prerequisite to any "dumb" load balancing method for Orion would require that all pollers are in all firewall rules or ACLs that allow traffic from any pollers to any polled nodes. Otherwise moving a node from one poller to another could cause it to be unreachable. Similarly, your environment could not have concerns about where a node is polled from, such as polling a node in the US from a poller in Europe or vice versa.

These restrictions could be lifted by checking for a custom property (or IP range, or hostname pattern) on each node indicating where it can or can't be polled from... but that's just a thought if someone else has issues with this. We don't, so I won't be including it in our tool.

Level 13

Derik,

What I was referring to here is a way where the system can choose poller automatically, but offering a fail safe or option to where the administrator can determine if the software will determine the poller or if he or she will manually assign it. Discoveries, adding nodes, etc. All default by design to primary server. Someone who might not be aware of the fact and add's servers or devices constantly will easily top out or go over the monitoring abilities of the primary when in reality they may have several additional servers sitting ideal. It's an idea designed for two style's of thinking.

One:

The junior level admin or regular users who aren't fully aware of the software's design. The auto function will be greatly helpful.

Two:

The expert Admin or savvy user who is aware of the software designs and can determine how they want the environment set.

There are other intricate details to worry about and consider I get it. But I was just giving a high level view of that idea so hopefully someone with even higher knowledge could take it and run with it and develop it. That's all really.

Yea... this needs to become more of a thing. I've been loading new nodes into my SW a lot recently totally forgetting that it defaults to poller 1. I didn't realize that I ended up crushing it. Manual load-balancing is sooooo 1990's.

Level 9

Please!