cancel
Showing results for 
Search instead for 
Did you mean: 
klinejw
Level 10

Multi-Core/Multi-PPE Implementations?

Who has implemented multiple core/primary polling engine servers in their implementations?  Why did you implement multiple cores?  Was it for scalability or other reasons?  What is the maximum number of elements deployed per core?  I have been administering an Orion implementation a single core and 5 additional polling engines for many years and now I am faced with an implementation that is considerably larger than my previous.  I know the documented limits for 1 primary polling engine is 100,000 elements.  I guess I am trying to understand if that is a hard limit or can it be extended based on your individual implementation.

Thanks in advance!

Labels (1)
Tags (1)
0 Kudos
19 Replies
rob.hock
Level 17

Re: Multi-Core/Multi-PPE Implementations?

We do have customers running far beyond 100k elements, and the limiting factor is almost always SQL server performance. Specifically disk I/O. With a beefy enough SQL backend, and perhaps a tuning of polling intervals, this number could be well extended. Leon Adato may have some further experience to share regarding scalability.

0 Kudos
adatole
Level 17

Re: Multi-Core/Multi-PPE Implementations?

rob.hock is absolutely correct about the 100k element limit being a function of your database, not of any internal "lock" from the software.

With regard to your original question - when you say "mutliple cores" do you just mean a server that has multiple CPU's (or a virtual server with multiple CPU/cores assigned to it) or something else?

The reasonable limit for elements per poller is ABOUT 10,000 - the SolarWinds specs say 8,000, but I've pushed it as high as 12,000. The devil is in the details - what kinds of elements, how often are they polled, what kinds of machines are those elements on (an old Cisco 6509 is not going to give up it's data as fast as a Nexus 2000), etc.

I know that adds up to a big pile of "it depends" but unfortunately... it depends.

It also depends on what ELSE you are running - it is just NPM, or do you also have SAM, or NTA (netflow)? Are you using (or planning to use) the new #DPI features in NPM 11 (hint: YES! YES YOU ARE!!). Do you have 800 alert triggers? Those are each queries, and will have an impact on your primary poller. Do you have 2,000 users logged into the web console, each of which is sitting on their favorite 3,000 row report? That's gonna leave a mark.

You get the point.

At the end of the day, here are the limits and options you have:

  • About 8,000 to 10,000 elements per poller
  • More CPUs are good. More RAM is gooder
    • figure 8CPU and 12Gb RAM per polling engine. Maybe more for the primary.
  • Faster database is goodest of all
    • hardware, not virtualized
    • for 100,000 elements, you should think around 12-24 effective CPU (that can be 4 proc's with 6 cores)
    • for that same 100k elements, think around 96-128gb RAM
    • RAID 10. Say it with me again: RAID 10.
      • Not RAID 5. No, really.
  • Watch your pollers. When they get too busy, add more. Yeah, it's an expense. Not as expensive as missing a critical alert.

Hope that helps.

- Leon

sja
Level 16

Re: Multi-Core/Multi-PPE Implementations?

Hi

Leon is sure right

I will push for DBA that will help you with the SQL.

About the SQL HW if if money was no object look at that kind IO card

Application Acceleration – Enterprise Flash Memory Platform | Fusion-io

0 Kudos
Deltona
Level 15

Re: Multi-Core/Multi-PPE Implementations?

The element count limit per polling engine has been raised to 12K. The old limit was 10K. The overall limit (old) of 100K is for a multi tenant deployment consisting of 10 polling engines (10.000 x 10 = 100.000). The limit now is 120.000 elements in total. Once the element count is exceeded, polling intervals will automatically be throttled and you'll likely end up with a 200 seconds poll interval or more instead of the default 120.

In my experience, the solarwinds job engine has a hard time keeping up with anything above 10k elements. That is however my case and we use all the modules except for IPAM. Your SQL server should be built to take the load.

klinejw
Level 10

Re: Multi-Core/Multi-PPE Implementations?

Great information guys...we are in the process of implementing as we speak.  When I was talking about 'cores', I was referring to primary polling engines.  We are planning on deploying all modules (NPM, SAM, WPM, NCM, VNQM, NTA, UDT).  I have successfully pushed a polling engine beyond the 10-12k limit without adverse affects.  My previous engagement was a much smaller infrastructure, so I am going from a 2000+ node installation to a 6500+ node installation.  As it stands, I don't think we are going to need more than one primary polling engine since we are currently only licensed for 8 additional polling engines - unless we can push these polling engines beyond the 12K limit.  Our plan is to virtualize all the Orion VM's including the database instance.  We have some pretty robust infrastructure to back our VMWare environment, so I have confidence that we can support the load.  As stated it does sound like it "just depends" on a lot of factors.  We are replacing an existing toolset in parallel, so we can see how the health of the SolarWinds environment plays out as we deploy our nodes and we will adjust polling engines and primary polling engines as necessary.  Thanks everyone.

0 Kudos
klinejw
Level 10

Re: Multi-Core/Multi-PPE Implementations?

Deltona - so just for clarification, once the 120K limit is reached for a primary polling engine, polling will automatically be throttled or will it only be throttled if polling rates can't keep up with the element load?

0 Kudos
johan
Level 8

Re: Multi-Core/Multi-PPE Implementations?

We have deployed two primary polling engines, each with an additional poller and one with an additional web-front-end as well. primary reason was total segmentation and control. We have a mixed breed of physical servers and virtuals, however, DB is preferred as physical.

We currently monitor just over the 30K elements in total, so we had to expand. The next step is for us to tie the two primary instances together with EOC to get a consolidated view across all our client environments. We also might be looking at several additional polling engines in the next year or two!

Your design will depends on a few things:

1) Total number of elements you monitor (regardless of license level) - each polling engine can handle around 10K - 11K elements (element is a node, interface or volume) - depending of course on your polling interval and CPU and memory requirements

2) Each primary polling engine can have up to 9 additional polling engines below it with a max element count of around 100K

3) Your design should take consideration the geography of your devices and polling details - for instance, if you want latency stats between your Head Office and remote branches, the polling engine should reside in your Head office so that the latency reflect the local path(s).

4) It is not recommended to have an additional polling engine on a remote site with your DB at a different site - rather opt then for two primary pollers and use EOC to merge the two together. Your stats/reports will be more accurate and your design will be certified/supported by Solarwinds. They do not recommend the additional remote poller solution.

5) If it is a hosted (MSP) environment, additional network issues such as NAT/Overlapping IP addresses are a real factor. Having separate primary engines overcomes the need for firewalls or NAT which could get very complicated when troubleshooting.

6) With EOC you should cater for at least 1 Mbps bandwidth between the EOC instance and the Primary polling engines

Hope this info helps you!

0 Kudos
Highlighted
Deltona
Level 15

Re: Multi-Core/Multi-PPE Implementations?

No, it's still only 12.000 element per Primary Polling Engine (the main engine). The PPE can not handle more than 12.000 elements.

As for the second question, you are right in both cases. It depends on the polling load and or number of monitored elements. If the polling load is too high or the elements count is too high, it will throttle polling interval.

0 Kudos