cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

IT Operations: Coping with Cloud

MVP

You’ll find no shortage of blog posts and thought pieces about how cloud computing has forever changed the IT landscape. The topic is usually addressed in an “adapt or die” argument: cloud is coming for your on-prem applications, and without applications in your data center, there’s no need for IT operations. You’ll encounter terms like “dinosaur” and “curmudgeon” when bloggers refer to IT professionals who have decades of experience in the data center, but have not yet mastered the skills necessary to manage a hybrid or cloud-native environment. It’s a bit self-serving. While it’s partially true that cloud will drastically change how your Ops staff go about their work day, the notion that on-prem IT is dead is a bit much.

At least for now.

While you could opt to continue about your day managing on-prem like cloud is just a fad (remember when that was the overwhelming sentiment in the mid-2000s?), you’d be wise to use this time to determine if your IT operations staff is structured to facilitate a pivot to the cloud. The journey from IT Ops to No-Ops is a long one, but rest assured, it will happen. So how do you get your intrepid Ops teams ready?

OMG NO MOAR SILOS

Dividing operations teams into groups based on common technical skillsets and resources is a deceptive practice when it comes to cloud. In many cases, these silos are the result of contracting preferences: many large organizations find it easy to contract for a specific skillset (e.g., let’s hire a company to handle and monitor our Windows servers). That contract is then managed by a single manager who simply refers to the team as “the Windows team.” Or maybe your mainframe days are not too far in the distance, and the big iron crowd referred pejoratively to the new IT staff as “the Windoze team,” and it stuck. A decade or two later, and you find yourself with the familiar silos: Windows, storage, network.

This is not an unreasonable approach to managing legacy on-prem. But it does get complicated when you layer on abstraction technologies like virtualization. And it gets absolutely bonkers when you go hybrid IT. Don’t believe me? Take your most experienced storage engineer and put her in front of the Google Cloud Storage console for the first time. Suddenly, all that experience and knowledge of storage protocols and disk partitioning strategies becomes irrelevant. Successfully managing cloud storage is an exercise in working directly with your development teams, becoming fluent in their development and deployment practices, and ceding control of the storage infrastructure to a cloud service provider.

The same is true for staff who may find themselves provisioning Kubernetes (that’s k8s for the cool kids) pods after deploying on-prem VMs for a decade. The very nature of provisioning resources in any cloud is an application-centric endeavor; it’s time to let go of the VM as the ideal unit of abstraction. We don’t have the problems in the cloud that pushed us to x86 virtualization in the data center.

One option is to shake up your teams a bit, and look to Agile team-building practices to guide your transition. Look beyond the eye-rolling crowd here; not everyone likes Agile and nearly everyone has a bad story about a failed shift. Build teams that cover the whole spectrum of your IT needs, and give those teams time to gel. Forming, storming, and norming are essential to achieve that sought-after team stage of performing.

By decoupling your teams from discrete IT resources, you’re encouraging a culture change that’s essential to adopting cloud.

Training for the Cloud

IT Ops veterans acquire a wealth of knowledge about more than just how the IT works in your organization: they have perfect knowledge of finding points of failure in systems, and how to design around those failures. They have intimate knowledge of your applications, and know which ones are more politically sensitive than others. And their diagnostic and troubleshooting skills are second to none. These are the exact skillsets you want in a cloud engineer, as these skills are not specific to on-prem infrastructure. With the right training, your staff can easily adopt cloud services. For example, a systems engineer who knows that you never put all of your VMs in a single chassis would quickly realize that you should never put all of your cloud VMs in a single region. Encouraging your on-prem staff to become familiar with the lexicon of cloud will make your journey to the cloud that much simpler.

Make Mistakes in Cloud Labs

Most CSPs provide free or low-cost access to lab environments so your staff can get acquainted with the UX of the cloud management console. Or you may allocate some budget to each of your IT ops staff to use in the provisioning of various cloud microservices. Whatever your method, give your Ops staff a safe place to make mistakes. They surely will, and you want those mistakes to be isolated from your production environment. Your staff will gain valuable first-hand experience with these new tools.

Just the Beginning

It’s folly to believe you can describe how to prepare your Ops teams for the cloud with just a few ideas. The transition to cloud is a journey, as they say. That word is carefully selected to reflect the long and arduous nature of moving from all on-prem to hybrid IT. Don’t expect your staff to simply become site reliability engineers overnight. Give them time and the resources to adapt and grow.

34 Comments
Level 14

Thanks for the article!  The recent Zoom outage citing AWS as the cause proves that there can still be problems in the cloud. 

Level 14

Not all clouds have a silver lining....

This is an excellent read.

Level 13

Thanks for the Article. I remember the predictions of the death of the mainframe but that took a lot longer than everyone was predicting. the key is finding the right solution for the application not simply moving it to the cloud because it's on trend.

Level 13

Thanks for sharing.  Good post.

Level 13

There will come a day with the storm clouds will come and everyone will be taking cover.  We have stuck our toe in the all mighty and thankfully pulled it out. 

Level 14

Some high level managers love the the hype surrounding shiny new objects.  Then they expect us to secure it their shiny new object, which isn't always possible.  Then they get to explain to their boss why they spent 6-7 figures on the new toy and it can't be used as advertised.

MVP
MVP

The mainframe analogy is dead on. I’m sure we will see future tech throw shade at cloud in the same fashion.

Community Manager
Community Manager

I've always felt that IT is cyclical.

Thin Client --> Mainframe does the work

Workstation does the work --> Servers

Thin Client --> Server does the work

Workstation does work --> Cloud

Workstation --> Cloud does the work

Everything old is new again, but just being rebranded.

MVP
MVP

Nice write up

Level 14

Yep.  The cloud looks great until you start to look under the hood.  It's just servers but they are in someone else's data centre and you have no control of them.

No MOAR Silos?  Ha!  Untrue.  The cloud carries its own unique silos that can narrow down supportability even further.

I find it interesting some companies are withdrawing some (or all!) assets from the cloud when it becomes apparent the cloud is not as reliable/secure/available/affordable as expected, or when certain applications or services don't lend themselves well to cloud access.

Sometimes bandwidth issues are blockers for using the cloud universally.  Here in rural northern Minnesota I can't get a Wave to each of my 100 sites so they all have 10G or 1G DIA uplinks.  I'm lucky when some can achieve a bundle of six T1's.  And yet all are expected to have the performance seen in my on-campus 40Gbs connected buildings.

Sometimes it's watching the reports of supposedly resilient cloud services failing.  AWS and Azure and others have not had perfect track records, and extended outages to data/systems/services that were supposedly NEVER going to be unreachable reveal they CAN become unreachable.  Hopefully that'll get ironed out in the future.

Don't forget Carrington Events.  They have the ability to cause catastrophic and extended outages for everything in the cloud--as well as continental electrical grids.  And they're not isolated to 1859. Here's one. And here's another.

I won't jump on the Luddite bandwagon, but I also don't ignore the possibilities.  I'm not a survivalist, but I do ponder how life would immediately change if I couldn't get electricity and fresh water  and petroleum products.  Here's hoping that doesn't happen in our lifetime, but it's not acceptable to believe it's impossible.  Accepting what's possible and comparing the likelihood of it occurring are activities that may enable you or your company to be better prepared for them.

Hopefully once that happens we'll all be smart enough to recognize that silos are blockers to support & to knowledge--they must go away.  But until there's sufficient knowledge share and training I suspect they won't go away.

Level 20

One area I never see mentioned enough is Security and Information Assurance.  None of this cloud means anything if you can't secure your data.

Level 20

Commenting on the additional new silo's from Rick... I see O365 as a perfect example.  I can't tell you how many companies thought this sounded too good to be true... then found out it was.

Community Manager
Community Manager

I, personally, love O365, but you are correct it's not the same thing as having an Exchange Server in the office.  I will tell you one thing.  I'm super glad I never, ever, ever, need to worry about running out of disk space ever again.  There's nothing that'll sideline your day like a out of control mailbox growth that fills up a volume and crashes the Exchange services.  That's a bad way to start a week.

I couldn't agree more.  Having office 365 in a tenant of over 25000, with several domains and not having to ever manage hardware, patches, etc again.   mmmmmmm  Yeah, worth it.   Unlimited OneDrive for Business with E3 and honestly the SharePoint is incredible.   Who wants SharePoint on premise anymore?  O365 has been more better than more worse.   And yes i know "more better" is terrible English, but I speak American so it's fine....

MVP
MVP

Cloud computing provides a new lexicon of abilities and risks. It's largely up to customers to use these resources appropriately. That's the biggest advantage that an on-prem hosting staff has to offer: service and support.

MVP
MVP

I think the end state is the cloud does the work, AI obsoletes the need for us, and the people hope it works out in their favor.

MVP
MVP

Sounds like you've got a cautionary tale to share...

MVP
MVP

haha, i know the type of person you're talking about. they attend a single conference, have drinks with a vendor, and come back to the office with a NEW IDEA that will CHANGE EVERYTHING. i'm all for change when it's managed. but run like hell when you hear that you're ditching Windows for kubernetes.

MVP
MVP

I generally agree, and i see that sentiment expressed on twitter ALL. THE. TIME.

but to sum up any major cloud platform as "someone else's computer" is to acknowledge a fundamental misunderstanding of how cloud services are intended to be consumed. i'm as snarky as the next Internet denizen, but this one does annoy me a bit.

MVP
MVP

i nodded silently while i read your whole post. totally agree.

assumptions that cloud is 1) a monolithic solution for all problems, and 2) easy are irresponsible. it's a technology. by now we should have learned that no single technology will correct organizational or executive dysfunction.

re: silos, god help the office that creates a CLOUD TEAM.

MVP
MVP

O365 is SaaS! It's so easy! Just buy some subscriptions and start using it!

*24 months later...

So... do we have a governance plan in place for O365 yet?

MVP
MVP

I think most CSPs would quickly remind customers of the shared security model, and firmly place the responsibility to protect and secure your data in your hands. what do you think?

And the day after implementing O365 watching the Help Desk tickets go up and up and up, soaring far in the stratosphere.  Their common complaint:  "Everything Microsoft is slow!"  "I used to be able to send an e-mail to someone near me and they'd get it right away; with O365 it sometimes takes minutes or even hours for them to receive it." 

I can personally attest to that last one.  Kiwi sends me syslog messages for certain data center events via e-mail with O365.  Some I receive in a few minutes, some are a week late.

MVP
MVP

I like the messages MSFT sends about O365 issues. Clearly one-directional. "We noticed a problem with some email not being delivered. We'll probably fix it soon. Don't call us. Ever again."

I think if you persuade me to spend beaucoup dollars on something I can't see, on something I can't secure physically, and on something I can't monitor completely--you've done a great job of pulling the wool over my eyes.

What do we KNOW about cloud security?

  1. We cannot prove or verify its physical security.
    1. Is our data stored in a dozen diverse locations or just one?  Is it also stored in the Middle East or Russia or China? 
    2. We have no way to review or approve or deny physical access to the cloud data center resources.
    3. We assume / trust every cloud provider goes to the Nth degree of physical security, and that all staff / visitors / regulators / inspectors are forced to comply with that Nth degree security, but we cannot prove it to ourselves or our customers.
    4. We don't know that a cloud provider has vetted the history and security and criminal record of every <  electrical inspector, OSHA inspector, HVAC technician, plumber, data cabler, etc.. > before that person/company is given physical access.
  2. When we can't control it physically we can't control its security.  If someone can slip in a USB key or a CD anywhere in the cloud's hardware, our data may be tampered with, stolen, or deleted.
  3. We can't prove who has logical access to data stored in the cloud.  Cloud operators bury clauses in the boiler plate's fine print to the effect of:   "We have firewalls in place, and we control all adjacent physical & logical spaces, but YOU the customer are responsible for securing your data from the Internet--OR from other customers in our environment."
  4. We're forced to trust the word of invisible entities, trading our money for the privilege of storing our data . . . where and with what?

I'll be skeptical and accept I'm living in the dark ages, blind to the positive potentials and focusing solely on the negative ones.

Level 16

I know a company that is still heavily using their mainframe(s) even though they have been retiring them for the past 20 years.

Problem now is their entire mainframe team is in their mid 60's and soon they are all going to retire themselves.

Community Manager
Community Manager

Sounds like retire or be retired.

Level 13

In our premises the mainframe team retired when the Mainframe did. Having said that the mainframe could have been looked upon as a large Server with lots of virtual servers, bit like VM. Lots of IT people were frightened of it because they didn't understand it.

Level 14

I have to disagree.  I'm a Systems Administrator managing a global infrastructure.  The 'Cloud', whether it is Azure, AWS, Google or any number of others still has to run on tin.  That tin belongs to someone else and if you have no control of the tin, you don't have full control of the systems that run on it.  You are totally dependent on someone else managing it and securing it.  If you don't control who has physical access to the servers how do you know what they are up to.  I know it is easy to say that someone else has the headache of managing hardware failure, hardware fault tolerance etc. but that also takes away some of the control.  It's OK saying Microsoft can host a Domain Controller somewhere but I really don't appreciate the latency from a DC hosted in say Texas for users in Singapore for example.  The users will complain of slowness because that's what they do.  I can better manage the user experience if I have control of where the hardware sits.

There is a place for 'cloud' but it isn't at the expense of local datacentres.  It should be as a compliment to them.  I plan to have a few DCs in Azure as a backup in case local ones go down.  I still want users hitting the local ones first though.

MVP
MVP

No one has total control over their environment. Unless you've personally reviewed each line of code in each product you use, you're giving up some degree of control to your technology partners. Just like we trust VMware and Microsoft for our on-prem VMs today, we trust Amazon, Google, and others with our cloud workloads. As always, we have to design around these trusts to a cost-constrained degree.

I'm not as familiar with Azure as I am AWS and GCP, but I assume you can select from a list of regions for your cloud VMs to keep them close to your users. Is that an option for you? Just curious, really.

The hybrid model is where we're all heading, after doubting the cloud-first and cloud-native movements of the last several years.

MVP
MVP

Some come down on the side that the cloud is the future and the cloud is everything, others come down on the side of no, never,ever. It's wise to evaluate the pro's and con's of both on premises, cloud and co-located. (I'm sure I'm missing other options as well)

An example for cloud would be churches streaming live services. I worked for a church that did this and to handle a very early, less attended service took 1 powerful server to handle the load. Larger services could take up to 9 powerful servers.

Onsite that was about 9 powerful servers that needed to be kept available and only used when needed. About $10,000 + software + the DMZ to handle the traffic + maintenance + electrical and HVAC.

Using a web service they would just "spin up" a server as needed and pay for just that use. Instead of thousands of dollars per month (plus the administrator and his/her time) they paid under $500 per month as they just used the time and services as needed.

On the other hand if you needed to run those 9 large instances on the cloud 24X7 then the cost of on premises could be less.

There's many other concerns, but this is just one aspect.

Regarding O365, almost every single organisation I work with has issues with performance when they migrate from on-prem to pure O365. One look at a NetPath to 'outlook.office365.com' will show you just how much the endpoints in Microsoft-land struggle. The path is generally green all the way up to one or two hops before the endpoints.

I honestly think that the hybrid approach is the most sensible option for enterprise. Retain your on-prem at your large offices, migrate the rest into O365, and harness the benefits of Teams etc in the cloud. Corporate email just isn't ready for pure O365 at scale, even with a sensibly sized ExpressRoute with a decent partner.

I can attest to this.  100% spot-on.

About the Author
Long-time SolarWinds implementer and user. Spend my days now with vSphere, HDS, Nexus, and pretty much everything else.