cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Additional Pollers - Automatic Load Disribution

Additional Pollers - Automatic Load Disribution

Hi there.

We recently added additional pollers to our Solarwinds system and I was shocked that you have to manually allocate nodes to specific polling engines.

My feature request would be for the OPTION to have automatic load balancing of additional pollers in an upcoming release.  There should be consideration put into how the load distribution works and we need to retain the option to manually assign nodes still (because there are corner cases in my opinion for wanting to manually control certain nodes).

Thank you.

18 Comments
Level 12

Yes, I agree with this as well. This is a feature sorely needed in Orion.

One of the reasons why our organization is looking at implementing Microsoft System Center Operations Manager 2012 R2 is because of it's design for scaling the monitoring environment up or scaling it sideways - SCOM uses the concept of a "Management Server Pool". Details can be seen here: Microsoft SCOM Training – Learn about System Center Operations Manager – Microsoft Virtual Academy.

Quote taken from the free training material at MVA ----

"As you decide to expand the capacity (of SCOM), you will add additional servers into the management pool group. You roll out another management server that will be in the same pool. As soon as you do this, some of the roles that were running on the first management server automatically get balanced to the new management server. This takes the load off the first server, and spreads it out around the pool. As you add additional management servers, the roles will further balance. Therefore, the pool will take care of making sure that you have enough resources and you are doing everything you can to spread out across the various management servers. In addition, if one goes down, the role that would be running on that server will automatically fail over to the other management server. Everything is balanced across the pool."

- Copyright Microsoft

- Joe

Level 11

That's a nice story joe; but cast a critical eye on all that fluffy Microsift sales jargon... I still have Scars from MOM, SCOM v1 (the non-working edition) and SCOM 2007 R2.

The truth is that none of these solutions scale up past a certain point without pain, and they often let the users find where that limit exists.

Level 12

ctmidnight‌ oh trust me I hear you there I've been burned by MS as well (most of us have). We are still in the process of testing our SCOM 2012 R2 in our environment and I am still highly critical of it's so-called dashboard and monitoring capabilities --- Personally I am not impressed (not yet anyway). I don't have the warm fuzzies when I hear the term "SCOM". But there is one true thing --- there are pros and cons that must be weighed with every solution out there, even with SolarWinds.

Level 9

having the ability to have a primary/backup poller would be great.  I've had instances where my additional polling engine has stopped and I might not know about it for too long.  The ability for a backup or secondary poller to automatically take over for failover would be a great addition.

This really cannot be that hard to achieve in some form.  I am working on a solution for automatic polling engine fail over right now.

Seems like an alert could be set up with an action:

Alert:

node selection: (all nodes assigned to engine 5)

Alert Trigger: (engine 5 services, interface, (whatever you want)  not working

alert action: execute (script action, sql update, best solution)  to move effected nodes to engine 11, only have to change one value in the DB.

Here is what I have now:

current alert suppression using dependencies:

Right now,

Custom property_Area: value = poller1, poller2, poller3, etc....

Group with dynamic query: if CP_area = poller 1, assign to group 1, (etc..)

Depenency: all members of group 1 are dependant on poller1 interface being operational.

This solves the problem of sending off 1000 alerts, when someone accidentally disables the interface, but I think the ability to assign members to a group based on the value of their engineID would be more direct and more efficient. 

Just need a way to use engineID in a dynamic query, and use engineid as the node selection criteria in an alert that can execute the action to change the engine id on the nodes.

For example:

Group1 = all nodes assigned to poller 1.

group1 is dependant on poller1 being operational

alert:

all nodes on poller1.

if poller1 is down, execute action to move nodes to poller03.

this way once the nodes are moved they are also moved to group 3 and alerting will automatically resume, but will not alert while poller1 is down and they are assigned to poller1.

So the question is, is it possible to assign nodes to a group based on actual assigned polling engine?

Can an alert object selection criteria be selected based on assigned polling engine?

Can a Sql update or something similar be created as an action?

Level 11

Most common observed issue with our external pollers is when they have a process like svchost.exe or w3wp.exe using excessive system resources, and the poller stops processing work or starting new tasks... The poller itself rarely croaks.

So the neat trick would be to monitor internal logs for specific errors and initiate a failover based on the error type or frequency; not just when the poller DB heartbeat starts failing or it dies outright.

bump

Community Manager
Community Manager
Status changed to: Open for Voting