Orion Architecture

fcaron over 12 years ago 3 minute read time

Legacy Page
This blog was posted in 2011 and is no longer accurate. For the latest information, check out the following resources: SolarWinds Orion Platform Scalability Guide Orion Platform Architecture and Deployment Options Video

Legacy Page

This blog was posted in 2011 and is no longer accurate. For the latest information, check out the following resources:

SolarWinds Orion Platform Scalability Guide

Orion Platform Architecture and Deployment Options Video

This blog recaps in relatively simple terms and diagrams, the basics of Orion’s architecture.

It is obviously not exhaustive in terms of the products and the deployment combinations, but it will hopefully give you the basic rules so you can easily derive and adapt them to your particular Orion deployment.

This blog is designed to help you getting rapidly familiar with most of the concepts and terminology but does not replace the architectural considerations described in each product’s user documentation set.

It is also a good set of pointers to many excellent blog postings that have been written in the past on all these products and components. Just follow the links…

The “*” in front of the names in the diagrams below, denotes commercial products. Boxes without “*” are modules that come with the Orion infrastructure and cannot be bought (e.g. Core).

I hope you’ll enjoy it and like always, post your comments and questions here, we’ll try to respond to them and improve this blog.

Basic software architecture

Previous blog posts of interest: Orion NPM vNext Sign Up for Beta 3 & tell us what you think of improved Juniper support and many other features and improvements, Does your Service Provider deliver?, Data Aggregation and Summarization for NetFlow (NTA), NPM & APM Level 1 Customer Training
Orion Core’s ability to handle dependencies and groups of managed elements in general and dynamic service groups in particular

Scalability: Growing an instance of Orion

With the Additional Web server (more users)
And the Additional Polling Engine (larger/more networks)

Scalability: Consolidating multiple instances of Orion

EOC can be used for scalability or organizational needs (e.g. regional and national responsibilities; visualization offered at both levels)

Segmented deployments

Avoiding VPN accesses with the Additional Web Server

Standalone products: APM, UDT, IPAM, NCM, SEUM

With EOC providing a single pane of glass view
NCM 7.0, 7.1, and 7.2 Architecture and Deployment
Standalone vs. Module – What’s the difference and should I care?

Standalone products (shared DB server)

It is possible to host several SQL Server databases on the same physical database server
We do not support two instances of Orion products sharing the same database, but multiple Orion database can share the same SQL Server host if it is appropriately sized
We recommend to run the DB Server on a physical server (not a VM)

Highly Available Architecture: Orion Fail Over Engine (Understanding the Orion Failover Engine Architecture) delivers several levels of protection

Server and Application (shown below). Also Network, Performance, Data. See FOE’s User’s Documentation on this page
FoE can protect the following components: Core, APM, EOC, IPAM, IP SLAM, NCM, NPM, NTA. NTA 4.x and FoE is addressed specifically here
More on FoE: Orion Failover Engine Deployment Options and Q&A
Upcoming Webcast - Creating High Availability and Fault Tolerant Environments using SolarWinds Failover Engine

MSP-type deployment

Today, there are two recommended ways to deal with MSP-type deployments of Orion, where an MSP manages Customer networks that have potentially overlapping IP Addresses

NAT-based deployment: Network Address Translators translate the customer domain addresses, so that they are all unique from an Orion perspective

EOC-based deployment: a full instance of Orion is deployed per Customer and they are consolidated at the MSP level by EOC

NAT-based deployment

NAT eliminates overlapping IP addresses

Makes identifications of managed devices more complex because the translated IP’s don’t make sense to report readers. This can be addressed by populating custom properties with IP’s or Names that will not be affected by any translation.

EOC-based deployment

More on MSP deployment in general and multi-tenancy in particular here

Top Comments

rcbarr over 4 years ago

Thanks for the correction aLTeReGO and thanks for your help in getting through that one, it was a huge get.
- Cancel
- Vote Up +1 Vote Down
- More
- Cancel
aLTeReGo over 4 years ago

rcbarr, the case you make reference to above, CORE-12365, is specific to some erroneous and benign errors in the error log files related to a legacy dependency that is no longer included in 2018.4 or later. Going through your case history the tracking number for the real issue you encountered appears to be 'PRO-765', which was addressed in Orion Platform 2019.2.
- Cancel
- Vote Up 0 Vote Down
- More
- Cancel
prashant_kadam over 4 years ago

Thanks for the info.
- Cancel
- Vote Up 0 Vote Down
- More
- Cancel
rcbarr over 4 years ago

To me, it's a no-brainer, you have to do a proof-of-concept. You are going to uncover things that most of us have not seen. We are in flight with a much smaller HA implementation, 2 data centers, same city about 40 miles apart. We have 8 pollers at each DC. We are running about 75000 elements across this infra.
WE have 20gb links between the data centers. At each data center our 8 pollers and dedicated database server are on the same network segment. While this won't work for you, to save on SQL cost we implemented SQL 2016 Standard at each data center using "Always On", works perfectly :-).
There is so much to consider with what you are proposing, but I feel it can be done in the right circumstances, network, physical servers, power man, your going to need it. My primary Pollers are Dell PowerEdge FC630s 2-PROC, 40 Cores, 64gb of memory, running all SSD drives.
The remaining 14 pollers, 7 Active, 7 DR or a slightly smaller footprint.
Another very significant infra component we have is on our 2 dedicated database servers, our NetPerfMon database runs on 3.2 terabytes of FusionIO (it was the smallest size Dell sold), pretty much the fastest storage on the planet (take a very close look at this, if not FusionIO, SSD all day long), imo.
We are running Orion Platform 2018.4 HF3 and NPM 12.4. It is imperative you get this patch CORE-12365. I am quite sure it is "not" in our current code, meaning it will be a HF4,5,6 before it is included, aLTeReGo can comment on this possibly. This patch addresses a RabbitMQ issue with HA.
Get with your Sales guys and or Account Rep and get Trial licenses, setup your infra (small footprint) in each location and start hammering away.
This is my opinion, how I would go at your challenge.
This gives me flashbacks to my days at JPMorgan Chase, 135,000 servers to manage, in almost every country on the planet with major data centers in AsiaPAC, Europe, and the US. (We did not use Solarwinds there).
Good Luck, should be quite an endeavor.
- Cancel
- Vote Up 0 Vote Down
- More
- Cancel
rschroeder over 4 years ago

Given that this thread is eight years old, you might have improved responses to your questions if you opened a new thread by clicking Create and selecting Discussion:
I have no direct experience with your specific environment or limitations, but my thoughts on your questions follow:
1: Can NTA be configured on USA server and then collect flow from Europe and Asia with the Pollers? Will the Delay and traffic cause any issues?
Answer: You should have no problems with NTA info sent from Europe and Asia to the USA. Delay should also not be a problem. Traffic from other sources may be a problem if it saturates your bandwidth, but should not cause problems with Netflow data unless saturation continues for excessively long periods; but the only impact I'd expect to see is delayed information.
Netflow data should not ever cause a pipe to become saturated. Netflow data is a tiny fraction of the overall data passing through routed links.
2. What other products should be installed and running at the remote sites to increase reliability and decrease network traffic across the WAN links?
Answer: There are WAN compressing appliances (e.g.: Riverbed, Steelhead, etc.) that can improve WAN throughput through compression and deduplication. I am not advocating them, but I mention them because others have had success using these "bookended" devices (one goes at each end of a WAN circuit) to compress - decompress traffic so a circuit carries the least amount of wasteful traffic.
Example: In a medical environment, imagine needing to send an X-Ray image that's mostly black background across a WAN. It might be part of an image arrage of hundreds or thousands of images that can be used to animate a view of a patient's internal organs for analysis and diagnoses. Each image might be 4 GB in size, but the majority of the data is identical black pixels. You can imagine the wasted bandwidth. A WAN Optimizer can remove the wasted data from being sent across the WAN, while simultaneously sending a small data packets that indicates the receiving WAN optimizer recreate that data locally instead of looking for it from across the WAN. Optimizers don't work for all traffic types--especially for video or voice. Check them out and see if they are useful for your WAN traffic.
3: Is there a way to failover core from USA to Europe site if HA is built in Europe? We have F5 load balancer is this the best method for Failover?
Answer: I'm unqualified to comment about core failover from Europe to USA via HA in Europe. I have HA but have not implemented it, and my WAN extends only 500 miles. I've used F5's for many applications with great success, but we've recently begun moving certain services off of F5 into Citrix Netscalers due to the limited support and compatibility of F5's Citrix Thin Clients. Netscalers work better for thin client support than F5's. You may find Netscalers, or other load balancers (software-based) may suit your needs and limitations.
4: Could I have multiple SQL servers on all 3 locations and then use SQL Clustering? Is the Delay to high for this? Would this keep SQL traffic to its region or would it still reach out to USA server SQL when using the Web application. I would like to keep most traffic to its region to prevent network saturation.
Answer: I'll defer to SQL experts on this topic; I'm not a SQL admin.
Swift packets!
Rick Schroeder
- Cancel
- Vote Up +1 Vote Down
- More
- Cancel

Thwack - Symbolize TM, R, and C