cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Orion Architecture

Level 15
Legacy Page

This blog was posted in 2011 and is no longer accurate.  For the latest information, check out the following resources:

SolarWinds Orion Platform Scalability Guide

Orion Platform Architecture and Deployment Options Video

This blog recaps in relatively simple terms and diagrams, the basics of Orion’s architecture.

 

It is obviously not exhaustive in terms of the products and the deployment combinations, but it will hopefully give you the basic rules so you can easily derive and adapt them to your particular Orion deployment.

 

This blog is designed to help you getting rapidly familiar with most of the concepts and terminology but does not replace the architectural considerations described in each product’s user documentation set.

 

It is also a good set of pointers to many excellent blog postings that have been written in the past on all these products and components. Just follow the links…

 

The “*” in front of the names in the diagrams below, denotes commercial products. Boxes without “*” are modules that come with the Orion infrastructure and cannot be bought (e.g. Core).

 

I hope you’ll enjoy it and like always, post your comments and questions here, we’ll try to respond to them and improve this blog.

 

Basic software architecture

 

 

image_thumb_325087C1.png

 


 

Scalability: Growing an instance of Orion

 

  • With the Additional Web server (more users)
  • And the Additional Polling Engine (larger/more networks)

 

image_thumb_4E380DA0.png

 


 

Scalability: Consolidating multiple instances of Orion

 

  • EOC can be used for scalability or organizational needs (e.g. regional and national responsibilities; visualization offered at both levels)

 

image_thumb_115D151A.png

 


 

Segmented deployments

 

  • Avoiding VPN accesses with the Additional Web Server

 

image_thumb_70D5D567.png

 


Standalone products: APM, UDT, IPAM, NCM, SEUM

 

 

image_thumb_3D999BFE.png

 


 

Standalone products (shared DB server)

 

  • It is possible to host several SQL Server databases on the same physical database server
  • We do not support two instances of Orion products sharing the same database, but multiple Orion database can share the same SQL Server host if it is appropriately sized
  • We recommend to run the DB Server on a physical server (not a VM)

 

image_thumb_5629394E.png

 


 

Highly Available Architecture: Orion Fail Over Engine (Understanding the Orion Failover Engine Architecture) delivers several levels of protection

 

 

image_thumb_4E9DC9E1.png

 

 

MSP-type deployment

 

  • Today, there are two recommended ways to deal with MSP-type deployments of Orion, where an MSP manages Customer networks that have potentially overlapping IP Addresses

 

NAT-based deployment: Network Address Translators translate the customer domain addresses, so that they are all unique from an Orion perspective

 

EOC-based deployment: a full instance of Orion is deployed per Customer and they are consolidated at the MSP level by EOC

 

  • NAT-based deployment

 

NAT eliminates overlapping IP addresses

 

Makes identifications of managed devices more complex because the translated IP’s don’t make sense to report readers. This can be addressed by populating custom properties with IP’s or Names that will not be affected by any translation.

 

image_thumb_4A6A4303.png

 

  • EOC-based deployment

 

image_thumb_42922A94.png

       

  • More on MSP deployment in general and multi-tenancy in particular here
53 Comments
Level 9

Unfortunately, using the EOC while helpful to an administrator, how do you get separate databases to appear as one?

Right now I have a PRIMARY NPM server and 11 Polling Engines, with a standalone Database.

The true NON scalability is from the limited number of elements that can be monitored by a single polling engine, it has improved from the previous 9,000 to 12,000 elements, but with the advent of large numbers of interfaces on devices/nodes some with 4,000 interfaces, you can't get very many onto a poller. I'm forced to track elements, instead of managing the system.

Very good post.

How would one suggest that SAM is deployed accross an environment where data centres are geographically seperated? My primary data center is hosted in the UK with smaller sites in the US and Australia. The US and Australia sites only have up to 100 nodes (max) therefore this may not warrant a full scale EOC deployment.. Is a seperate poller in each location the only way to go in this regard?

Apreciate any suggestions?

Thanks, Dale

Product Manager
Product Manager

SAM 5.2 and earlier do not support remote poller deployments. Additional pollers are fully supported but they should be located on the same LAN as the SQL database server. E.G. low latency. SAM 5.5, currently in beta does support remote additional pollers which can be geographically dispersed to remote locations away from the SQL database server. With SAM 5.2 I would recommend a SAM instance for both your US and Australia locations rolled up to an EOC. However, if you have decent connectivity with relatively low latency to these locations from your UK office via a lease line or VPN, you may be successful polling these servers remotely from a centralized location.

Many thanks for the response.

Level 9

I was told the the new release of NPM 10.4 would allow for 3 polling engine clusters presenting a single IP to the network.

Is this the case? If so a diagram of how this looks would be helpful.

Thanks

Radioman

Level 9

Nice Blog

It does explain alot.

Is there a document like this in a PDF format somewhere on the website.

Is there perhaps an updated version of this?

Level 15

Thanks.

No PDF documents.

We'll look on an update. It's still mostly accurate, but a few points need updating

Level 13

I see no mention of the GUI re write based in the request HTML5 engine

the current GUI is very out of date and many users are complaining  about items like auto  refresh

imagine if google maps was not based on HTML5

when will SW update orion core engine

Level 7

Great idea..

Level 11

Good one

Level 10

Solarwinds people,

Looking over these diagrams and something is sticking out at me.  You have a heavy reliance on scaling CoreDB sql services UP and not out.  Are their any expectations that you'll start to use a myriad of specialized services to accommodate to the scale of your customers? Ex, Memcached, NoSql, application routing.

Additionally, more web services don't fix many problems.  You need to allow web gardens for application pool distribution.  Too many times i find the entire system under lock because of a long running process in the application pool. 

OH man. +10 to this idea. I'm in this right now trying to map our architecture for the next 4-6 years.

Level 8

Hi @fcaron

I've logged a call for something related to this with the SolarWinds team.

Do you know what is the difference in traffic when:

  • you use traffic from the polling engine across the WAN vs polling traffic across the WAN.
  • When would you recommend a HA DB cluster
Level 13

when will SW update orion core engine t o be based on HTML5


Level 12

Just a question, what's solarwinds advice on using load balancers like eg Citrix netscaler? Can you do it with a core and web server or would you have to have two web servers as a front end with the core in the backend

Product Manager
Product Manager

Network Load Balancers are commonly used with a pool of Additional Web Servers.They cannot really be used with the main Orion poller since only one main Orion polled can be active at any given time. For high availability Orion instances you would need to utilize something like the Failover Engine.You could however load balance the web traffic going to the primary Orion instance and an Additional Web Server using a Network Load Balancer like an F5, NetScaler, or even using Cisco's Server Load Balancing feature.

Level 8

I have 2 web servers working quite effectively.

The named space is then advertised and always available.

Level 8

What is your element count per polling engine?

Are your polling engines all remotely located?

What is the traffic throughput from the polling engine to the DB?

Level 12

very good post ........

Level 8

Thanks for this post! Very good breakdown.

Level 9

thanks fcaron..its good

Nicely done.  Thank you.

Level 12

This post is from 2011, surely some of these practices have changed...

Level 8

Great! 

Level 11

could we have a up to date version ?

Level 13

Adding Referenced post when accessing Orion from WAN

Secured WAN access with additional web server in DMZ

Good topic.  Keep it coming. 

MVP
MVP

Very nice over view of Orion platform.

Level 9

We are running an Active/Active implementation across two distinct data centers.  Each data center has 8 Pollers and a dedicated database server.  Our challenge is their is no way we would ever convince our admins to make changes in two environments.  So we need a way to synchronize the databases if a DR situation occurs (Sync Primary to DR database).  Prior to version NPM 12.0 we had a solid solution, however with NPM 12 and the move to store the licensing into the database effectively broke our current solution.  I can provide a lot more detail, but what are you guys thoughts?

Product Manager
Product Manager

As stated in the NPM 'What we're working on' post, we are actively working to develop a new disaster recovery solution for Orion. In the interim, there have been several customers who have been successful utilizing HA in a multi-subnet configuration.

SolarWinds High Availability - HA - In a WAN Environment!!!

High Availability and Disaster Recovery Solution with Full Servers and Site Protection - RFC/Design ...

MVP
MVP

I'm visual so the diagrams really add clarity for me.

I still like this descriptive blog.  Are there newer descriptions of SW products and their growth that can be shared?

Level 10

I found this to be interesting and helpful. I will be able to reference back to this as needed.

Good one.  Architecture is a huge topic these days. 

Level 12

fcaron​ very good post and clear infrastructure description. Only one notice: shouldn't it be updated with new license schemas (NOC and NAC) as looks that the post is still read?

Level 8

Thanks for the info.

Level 9

Very nice and usefull

MVP
MVP

This is helpful in understanding how things "work" together.

Level 7

Hi,

How would the other products viz. Virtualization Manager, Storage Resource Monitor, Patch Manager, Server Configuration Monitor be deployed? Is there an architecture document depicting typical deployment of all products? !

Product Manager
Product Manager

VMAN, SRM, and SCM follow the same architecture as the rest of Orion. Patch Manager runs on its own separate server and can be optionally integrated into the Orion webUI.

Level 9

We are running a POC with the new HA solution now.  We started testing the multi-segment Always On database configuration this morning.  So far we are extremely impressed.  CJ is providing oversight, albeit to this point we have had no issues.  After we complete the database testing we will start on the Orion HA testing.  In our POC we have 2 pollers per data center and 2 WPM player servers per data center.  It's looking real good, we are very hopeful it will work as we expect.

Level 11

Is solarwinds planning to be available as a saas solution ?

Solarwinds is not in the short list (for POC) of my new company because of that.

We have more than 100k nodes and the plan to be full cloud shortly.

Product Manager
Product Manager

Orion is not available as a SaaS, but we have lots of customers that run it in the cloud on IaaS using things like Amazon EC2 or Azure VM. The harsh reality is that cloud native tools are by their very nature, incredibly immature. It will be many years (if ever) before they're anywhere close to doing what Orion can do today.

Level 9

ServiceNow does it pretty well, so does WorkDay.  But I am with you AlTeReGo, a monitoring tool/system with the amount of data flowing, good luck with that implementation 🙂

Level 11

Splunk Cloud, Dynatrace ...

Level 11

Datadog, PRTG...

Level 9

Does anyone have any updated Architecture diagrams that shows geographically separated sites and using HA.

USA- Europe- Asia

I am trying to build HA but I have three sites separated listed above. We have about 200 millisecond delay between each site. We cannot have a core and SQL at each site. Therefore, we cannot use EOC and its easier to maintain one core server instead of 3. We are currently using Orion Platform 2017.3.4 SP4, IPAM 4.6.0, VNQM 4.4.1, NCM 7.7, NPM 12.2, DPAIM 11.1.0, QoE 2.4, NTA 4.2.3, VMAN 8.1.0, UDT 3.3.0, SAM 6.5.0, NetPath 1.1.2. We will be upgrading to latest version this year.The connection between each site might be a 200Mb pipe and not very reliable. We had an issue where netflow saturated our link but it was just one time.

My Idea is to setup full Solarwinds in USA Server then remote sites in Europe and Asia will have WEB, Pollers with over 11,000 entities, NTA and maybe SQL cluster.

Questions:

1: Can NTA be configured on USA server and then collect flow from Europe and Asia with the Pollers? Will the Delay and traffic cause any issues?

2: What other products should be installed and running at the remote sites to increase reliability and decrease network traffic across the WAN links.

3: Is there a way to failover core from USA to Europe site if HA is built in Europe? We have F5 load balancer is this the best method for Failover?

4: Could I have multiple SQL servers on all 3 locations and then use SQL Clustering? Is the Delay to high for this? Would this keep SQL traffic to its region or would it still reach out to USA server SQL when using the Web application. I would like to keep most traffic to its region to prevent network saturation.

mjalden1​, 

Couple things you will need to consider in your environment.

1.  IP addressing of the servers.

2.  Patching of the Servers.  Is there a central location that will push patching or does each Datacenter do their own?

3.  Server Resources:  Are your servers appropriately resourced?  Do you have logical access to ALL Servers including SQL?

4.  Bandwidth between the sites.  Asia to US, Asia to Europe, US to Europe

5.  Any other pending issues or concerns that have been previously identified? 

Given that this thread is eight years old, you might have improved responses to your questions if you opened a new thread by clicking Create and selecting Discussion:

pastedImage_0.png

I have no direct experience with your specific environment or limitations, but my thoughts on your questions follow:

1: Can NTA be configured on USA server and then collect flow from Europe and Asia with the Pollers? Will the Delay and traffic cause any issues?    

Answer:  You should have no problems with NTA info sent from Europe and Asia to the USA.  Delay should also not be a problem.  Traffic from other sources may be a problem if it saturates your bandwidth, but should not cause problems with Netflow data unless saturation continues for excessively long periods; but the only impact I'd expect to see is delayed information.

Netflow data should not ever cause a pipe to become saturated.  Netflow data is a tiny fraction of the overall data passing through routed links.

2. What other products should be installed and running at the remote sites to increase reliability and decrease network traffic across the WAN links?

Answer: There are WAN compressing appliances (e.g.:  Riverbed, Steelhead, etc.) that can improve WAN throughput through compression and deduplication.  I am not advocating them, but I mention them because others have had success using these "bookended" devices (one goes at each end of a WAN circuit) to compress - decompress traffic so a circuit carries the least amount of wasteful traffic. 

Example:  In a medical environment, imagine needing to send an X-Ray image that's mostly black background across a WAN.  It might be part of an image arrage of hundreds or thousands of images that can be used to animate a view of a patient's internal organs for analysis and diagnoses.  Each image might be 4 GB in size, but the majority of the data is identical black pixels.  You can imagine the wasted bandwidth.  A WAN Optimizer can remove the wasted data from being sent across the WAN, while simultaneously sending a small data packets that indicates the receiving WAN optimizer recreate that data locally instead of looking for it from across the WAN.  Optimizers don't work for all traffic types--especially for video or voice.  Check them out and see if they are useful for your WAN traffic.

3: Is there a way to failover core from USA to Europe site if HA is built in Europe? We have F5 load balancer is this the best method for Failover?

Answer: I'm unqualified to comment about core failover from Europe to USA via HA in Europe.  I have HA but have not implemented it, and my WAN extends only 500 miles.  I've used F5's for many applications with great success, but we've recently begun moving certain services off of F5 into Citrix Netscalers due to the limited support and compatibility of F5's Citrix Thin Clients.  Netscalers work better for thin client support than F5's.  You may find Netscalers, or other load balancers (software-based) may suit your needs and limitations.

4: Could I have multiple SQL servers on all 3 locations and then use SQL Clustering? Is the Delay to high for this? Would this keep SQL traffic to its region or would it still reach out to USA server SQL when using the Web application. I would like to keep most traffic to its region to prevent network saturation.

Answer: I'll defer to SQL experts on this topic; I'm not a SQL admin.

Swift packets!

Rick Schroeder

Level 9

To me, it's a no-brainer, you have to do a proof-of-concept.  You are going to uncover things that most of us have not seen.  We are in flight with a much smaller HA implementation, 2 data centers, same city about 40 miles apart.  We have 8 pollers at each DC.  We are running about 75000 elements across this infra.

WE have 20gb links between the data centers.  At each data center our 8 pollers and dedicated database server are on the same network segment.  While this won't work for you, to save on SQL cost we implemented SQL 2016 Standard at each data center using "Always On", works perfectly :-).

There is so much to consider with what you are proposing, but I feel it can be done in the right circumstances, network, physical servers, power man, your going to need it.  My primary Pollers are Dell PowerEdge FC630s 2-PROC, 40 Cores, 64gb of memory, running all SSD drives.

The remaining 14 pollers, 7 Active, 7 DR or a slightly smaller footprint.

Another very significant infra component we have is on our 2 dedicated database servers, our NetPerfMon database runs on 3.2 terabytes of FusionIO (it was the smallest size Dell sold), pretty much the fastest storage on the planet (take a very close look at this, if not FusionIO, SSD all day long), imo.

We are running Orion Platform 2018.4 HF3 and NPM 12.4.  It is imperative you get this patch CORE-12365.  I am quite sure it is "not" in our current code, meaning it will be a HF4,5,6 before it is included, aLTeReGo can comment on this possibly.  This patch addresses a RabbitMQ issue with HA.

Get with your Sales guys and or Account Rep and get Trial licenses, setup your infra (small footprint) in each location and start hammering away.

This is my opinion, how I would go at your challenge.

This gives me flashbacks to my days at JPMorgan Chase, 135,000 servers to manage, in almost every country on the planet with major data centers in AsiaPAC, Europe, and the US.  (We did not use Solarwinds there).

Good Luck, should be quite an endeavor.

About the Author
Francois has joined the SW product management team in Dec 2010. He has been in the network management space for about 15 years, first in a startup company, then in one of the big 4 and back to a human-size company. Despite his bizarre accent, he is a decent guy to talk to.