cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 13

Cloud Fever, throwing all the SolarWinds into AWS

So after picking the amazing adatole​'s brain after his 3/13 LIVE WEBCAST: IF AN APPLICATION FAILS IN THE DATACENTER AND NO USERS ARE ON IT, WILL IT CUT A T...  presentation about migrating our SolarWinds environment to AWS, he recommended I let the THWACK community weigh in.

Here's the my situation in a nutshell:

Currently, our Orion SQL DB is in a SQL cluster shared by other applications.  The Orion DB is killing performance for the other DBs in that cluster. Based on current resource utilization and guidelines for future growth I'm looking at getting the DB it's own "server" whether virtual or physical with at least 128GB of RAM.  The SQL cluster has a shared RAM of 64GB of RAM.  We're not able to deploy a virtual server with that much ram in our virtual environment.

So what about a physical server?

Well...

My company has caught the "Cloud Fever" and the only cure is more Cloud!

Our parent company based in France has pushed out an IT edict that all 26 of it's international entities (North America, UK, China, etc...) must convert 50% of it's data center based hardware and virtual servers into "the cloud" by 2020. Unfortunately, cost concerns and performance be damned nobody is asking the important questions like "Why?"

So I'm being told that with this edict in play, any requests for a new hardware server would be instantly denied.

With this I'm trying to make this work as best as possible given the situation.  The slight advantage I have is that there is massive amounts of money being thrown at this cloud effort so, I can leverage that to make this as smooth as possible.

Another side issue is that our IT department is extremely siloed. My title is "Network Engineer" which means I'm a member of the "Network Team". However, it's 2018 and IT should all be on the same team.  The server team is full of old school Microsoft fanboys and girls that have fought AWS tooth and nail (and not for logical reasons). We have a very developed and robust Orion environment with 3 very dedicated individuals maintaining it and end users actively using it across many teams including some outside of IT.

The server team uses a neglected instance of SCOM 2012 to monitor servers, AD, and databases using mostly out-of-the-box alerts that only is sent to members of their team and whose web portal is only accessible by them.

I have graciously offered to take on the task of assisting them with integrating our servers and AD environment into SAM which would incur no cost as we are already licensed. I get immediate kickback with no logical reasons, almost like it's some sort of childish turf war for them.  So, asking for any assistance with the Orion servers from them is a pain because I offered to help them.

Here's what my Orion environment looks like now:

NAM 3000 with ACM 250 We're using NPM, SAM, NTA, NCM, IPAM, VNQM, UDT, and WPM.

Main PE: polling 12893 Elements with a job weight of 5650.

APE in Canada whose local subnets aren't routed to the "Main" network hence the need for an APE: polling 5008 elements with a job weight of 1929.

APE in our SCADA industrial controls system DMZ (we're planning on rolling this into the MPE since we should be able to poll these nodes with "routing and firewall magic"

Our AWS environment is in the very earliest stages, only 1 test application has been migrated so far, so I have a lot of freedom to plan out how to monitor that.

Our AT&T managed MPLS cloud has a direct connect into our AWS instance so, that should help alleviate some latency issues with our remote location polling.

Some of the advice Leon offered includes the following:

  1. Installing SW into AWS
  1. The first and most important thing you need to ensure is that the timing between the primary poller and the database remains low – under 1500 miliseconds. If you have latency that is longer than that, you  are going to experience errors and data corruption
  2. The second (and only slightly less important) thing is to ensure that your database is set up for the transaction volume – in on-prem terms, it needs to be RAID 10 or flash. Not RAID 5.
  3. The third thing is that you will likely be monitoring your on-prem environment using an additional polling engine, unless you have less than 100 devices on-prem that you wish to monitor
With all of that said, there is a guide to help you:

1)      Put the primary poller and the db in the cloud so that your timing between them is as short as possible. The primary poller will have very little to monitor (at least right now) and That’s OK ™

2)      Put an additional poller in the main site, and another APE in your secondary site. They cost nothing, so why not. They can be virtual. You can play with the hardware they’re assigned until you’ve salted to taste.

3)      If you can, install the AWS-based instance of DPA (it’s in the Amazon store) and watch your SW database with it. You will have the ability to see how it’s truly performing and where any bottlenecks might crop up.

a.       It’s also a great “advertisement” to your DBA team to show the capabilities of the tool. No I’m not trying to upsell you. It’s just a nice tool in your toolbox if you don’t have something else. And it’s natively cloud-based, so you can score some points from corporate.

4)      Make sure you add your cloud credentials to SAM. Again, score some corp brownie points.

Paging jbiggley​ at adatole​ suggestion to weigh in.
TL;DR I have to move my Orion environment to AWS because of corporate politics. Any advice is very appreciated.
So, what THWACKsters out there have installed/migrated Orion in the Cloud either by choice or by corporate politics gunpoint?
  1. The first and most important thing you need to ensure is that the timing between the primary poller and the database remains low – under 1500 miliseconds. If you have latency that is longer than that, you  are going to experience errors and data corruption
  2. The second (and only slightly less important) thing is to ensure that your database is set up for the transaction volume – in on-prem terms, it needs to be RAID 10 or flash. Not RAID 5.
  3. The third thing is that you will likely be monitoring your on-prem environment using an additional polling engine, unless you have less than 100 devices on-prem that you wish to monitor

Edit for grammar DERP.

15 Replies
Level 13

Sorry, I know the follow-up is long overdue.  I'm doing a lot of travel for work over the next few weeks. I haven't forgot about this and once things calm down (do they ever?) I'll write my follow-up.  In the meantime if anyone has specific questions about Orion AWS deployments, throw them in here or DM me and I'll respond when possible.

0 Kudos

Will you take cash payments to reveal Orion in AWS mysteries?

0 Kudos

LOL! I don't think that would be ethical but, I may accept Whiskey as

compensation. I'm finishing up work travel in the next few weeks (and

missing THWACKcamp so hopefully I can post my update soon...

On Thu, Oct 18, 2018 at 3:03 PM hpstech

0 Kudos
Level 13

Any updates on running the full SolarWinds suite in the AWS cloud?

What about HA - on-prem Primary Polling engine and having HA server in AWS?

We are heading in that direction. We bought HA module and will be building out SQL AlwaysOn DB, and placing an HA engine in AWS.

0 Kudos
Level 7

Interested if you did deploy into AWS? I am looking at Azure for the same reasons my co. wants to 100% native cloud in by 2021. I would like to move my Solarwinds to Azure and put pollers on prem, latency is well below 1500ms

0 Kudos

Azure SQL DB support has been officially added to the Orion Platform. You can learn more about this deployment option at the link below:

https://thwack.solarwinds.com/docs/DOC-204353#jive_content_id_Azure_SQL_DB_Support

trevski​, 1500 ms is likely to be pretty painful, Solarwinds recommends keeping the latency under 500 ms between any APE and the database server

- Marc Netterfield, Github
0 Kudos

I did!  I'm working out the kinks specific to our environment but, SolarWinds has handled the migration smoothly.  I'm planning on doing a full write up after things calm down a bit here.

I have 3 APEs including one of them taking the place of the former MPE.  The former MPE is using the same IP as it did as the MPE since it couldn't be cleanly used in the new MPE. Most of the Netflow from our WAN managed service provider's routers is being sent to it.  I was concerned about latency issues with connectivity between APEs and the MPE/DB.  After 2 plus weeks of running this model, performance is as good or in some cases better than it was prior to the migration.

We would love to hear about the full details of your migration as well.

0 Kudos
Level 13

So, I submitted a support case for this (Case # 69996 - Orion AWS Migration Assistance) to get more guidance on migration to AWS.

I received the following reply:

Thank you for contacting SolarWinds Technical Support.

My name is (redacted) and I will be working on this case with you.

We are not currently supporting the installation of Orion to AWS or Azure clouds, but we have received reports of customers successfully running Orion in the AWS cloud. And if you run into any issues we will help you resolve those issues. If you would like to schedule a call next week, please reply to this email and I will have one of the senior techs in this office schedule something with you.

The biggest issue I am aware of is latency between the Orion polling engines and database machines. So please make sure the virtual machines meet our system requirements and latency is within our requirements. Here's a link to our multiple module system guidelines. Since you have unlimited licenses, please use the large deployment recommendations.

https://support.solarwinds.com/Success_Center/Orion_Platform/Orion_Documentation/Orion_Platform_Admi...

And here is the migration guide:

https://support.solarwinds.com/Success_Center/Network_Performance_Monitor_(NPM)/Migration_Guide

A new installation of Solarwinds products comes with a 30 day license so you may consider installing Orion on a cloud VM, creating a new database, and comparing performance of the cloud based implementation to the local implementation.

Cordially,

(redacted)

SolarWinds Technical Support

If this is the case, why is there documentation to do so on the success center?

https://support.solarwinds.com/@api/deki/files/40251/SolarWinds_AmazonWebService_Deployment.pdf?revi... 

I have specific questions regarding migration to AWS but, as noted in the referenced documentation this isn't covered in it.

pastedImage_6.png

adatole​, cobrienjbiggley​ Any input?

I responded in private email, but the upshot is that SolarWinds wants to provide tools to help our customers be successful. That said, officially supported configurations often lag slightly behind what CAN be done, and sometimes behind what IS BEING done, until we have sufficient use cases to build troubleshooting documentation, best practices, etc.

Leon Adato | Head Geek
------
"Measure what is measurable,
and make measurable what is not so." - Gallileo

Makes sense, pretty hard to expect support tech's to know anything about a scheme that only a few system engineers on the planet have done successfully.

So far I've not talked to anyone who was satisfied with their experience of testing Orion app servers and DB in a cloud. You could probably swing some APE's and definitely the additional web servers, but aside from the political situation described in this post and maybe a small environment who doesn't have much/any existing on-prem server infrastructure, I don't think Orion is a very good example of the kind of software that people should be pushing into their clouds.

The workloads for Orion tend to be pretty consistent so on-demand scaling up and down isn't really a thing most people need for it, not to mention that it isn't exactly easy to automate spinning up new servers for the Solarwinds environment.  The DB workload is pretty significant, so it tends to need more than you can get from the cheaper instances.  It is primarily an internal tool and requires deep access to your infrastructure so it can be somewhat painful to get all your security and access control bits in place if your org takes a really strict stance, then you almost never serve content from it up to outside consumers so your consumption is is still often from the same places where the infrastructure you want to monitor already sits so putting the tool in the cloud just increases your data's round trip.  If your org is basically "cloud native" then it would seem that you might be looking for something a little more SaaS-y like AppOptics.

Just my thoughts, would love to hear from anyone who has successfully moved Orion off prem though.

- Marc Netterfield, Github

#ChallengeAccepted

jbiggley​ and I will have to talk about this in a few months time.

I recall hearing you mention that you were working on a cloud project, I had assumed this was just with distributed netpath agents.  Are you actually looking to move the core application and/or db off prem?

- Marc Netterfield, Github
Level 13

Monday bump

0 Kudos