Community
- Command Central
- MVP Program
- Monthly Mission
- Blogs
- Groups
- Events
- Media Vault
Products
- Observability
- Network Management
- Application Management
- IT Security
- IT Service Management
- System Management
- Database Management
Content Exchange
- SolarWinds Platform
- Server & Application Monitor
- Database Performance Analyzer
- Server Configuration Monitor
- Network Performance Monitor
- Network Configuration Manager
- SQL Sentry
- Web Help Desk
Free Tools & Trials

Cloud Maintenance- Turn your availability alerting upside down for cloud instances _without_ doubling the administrative effort!

Disclaimer: You can do some of this already, by having two alerts for each group of cloudy elements, but let's make Orion think smart about the cloud!

Picture this:

You have a group, and this group contains Azure or EC2 resource group nodes, and associated elements. Say they have applications assigned to them as well, for good measure.

Got it? Great. So, let's take a look at the alerts which encompass these devices:

You have alerts which include:

Node down
High response time
High memory use
High CPU usage
An application template which looks at "Critical App#1'

You get the idea! These all work to notify you when any of these go down, or breach their thresholds. Sounds like we have everything covered, right? Wrong! The bean counters have decreed that all cloud-based servers are turned OFF out of hours!

Our operations team's have been inundated by alerts and and our ITSM is chock-full of tickets. In short, Ops are sad-pandas. So, how can we fix this WITHOUT setting active hours on our alerts, and then creating yet another alert to tell us if somebody forgot to turn Skynet off? Simple! We use Cloud Maintenance!

This is an expansion of the cloud management, empowering Orion with the knowledge of what times are production times for a given resource group, and which times all elements should be dormant, by setting rules for cloud-based elements. The settings of Cloud Maintenance tell Orion how to handle alerting, and how to display suspended instances when they are in dormancy, and how to deal with devices which are breaching dormancy periods.

Here's how I would see this working:

When you setup your new cloud integration, an additional page of options is displayed, which allows you to configure Cloud Maintenance.
This allows you to set dormancy intervals, per resource group. When a resource group is dormant, all alerts are muted OTHER THAN the Cloud Maintenance alert!
The Cloud Maintenance alert, which you can create more than one for and assign per Resource Group if required, will alert your chosen recipients when any instance on a resource group which should be powered off is still up after the dormancy period starts.
Rules within Cloud Maintenance alerts allow you to AUTOMATICALLY power down said resource (if enabled), in a similar fashion to the way VMAN allows users to automatically manage resources within vCenters / Hyper-V. Peace of mind for all involved.
Overrides can be set, per node (if managed as an Orion node), for for patching etc, in a similar fashion to how node maintenance works now.

Not only would this streamline the management of cloud resources, but it would also allow organisations to ensure the mandatory power-down rules are respected.

Dormant nodes which are powered off will have a new status and colour within Orion, perhaps a faded cloud? Whatever it is, it'll be obvious when you see it Similarly, cloud instances which are breaching the dormancy rule will have another clear status, and there should be a dashboard widget which covers this.

This is very much version 1 of this idea, but for me this would be natural evolution of cloud support. Keeping down the cost of the cloud should be something we all want to optimise, and having NPM help with that will go a long way to driving home the relevance (and expertise) of Orion in the hybrid-IT era.

Comments welcome, Thwacksters!

Edit: Renamed to "Cloud Maintenance" due sensible feedback! Not everything has to have a buzz word, after all

Find more posts tagged with

cloud monitoring

npm

alert noise

Status: None

Comments

designerfx

This is the epitome of how cloud functions. I wouldn't call it cloud switch, maybe just cloud maintenance schedule. That way you can assign it per virtual data center. Otherwise I would have to do some custom property shenanigans via prod / non prod to even attempt this.

silverbacksays

You know, that's true. I've taken that advice and applied it

Not everything has to have a fancy buzz word! Cheers designerfx

m_roberts

There are quite a few similar feature requests, where the underlying requirement is to be able to schedule at object level (Node, Application, Cloud) recurring maintenance windows. We used to be able to do this in Orion with the Win32 app on the Orion server when alerting was console and not web based. Lots of polite requests for it to return.

https://thwack.solarwinds.com/ideas/2655#start=275

Vote on them all!