cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

Our Cloud Bill Is WHAT?! - Battle of the Clouds Series

Level 9

Submitted for your approval; a story of cloud horrors.

One of performance issues impacting production.

Where monthly cloud billing began spiraling out of control.

The following story is true. The names have been changed to protect the innocent.

During my consulting career, I’ve encountered companies at many different stages of their cloud journey. What was particularly fun about walking into this shop is they were already about 75% up into public cloud. The remaining 25% was working towards being migrated off their aging hardware. They seemed to be ahead of the game, so why were my services needed?

Let’s set up some info about the company, which I’ll call “ABC Co.” ABC Co. provides medical staff and medical management to many hospitals and clinics, with approximately 1,000 employees and contractors spread across many states. Being in both medical staffing and recordkeeping, ABC Co. was subject to many compliance regulations such as HIPAA, PCI, etc. Their on-premises data center was on older hardware nearing end of life, and given the size of their IT staff, they decided to move out of the data center business.

The data center architect at ABC Co. did his homework. He spent many hours learning about public cloud, crunching numbers, and comparing virtual machine configurations to cloud-based compute sizing. Additionally, due to compliance requirements, ABC Co. needed to use dedicated hosts in the public cloud. After factoring in all the sizing, storage capacity, and necessary networking, the architect arrived at an expected monthly spend number: $50,000. He took this number to the board of directors with a migration plan and outlined the benefits of going to the cloud versus refreshing their current physical infrastructure. The board was convinced and gave the green light to move into the public cloud.

Everything was moving along perfectly early in the project. The underlying cloud architecture of networking, identity and access management, and security were deployed. A few workloads were moved up into the cloud to great success. ABC Co. continued their migration, putting applications and remote desktop servers in the cloud, along with basic workloads such as email servers and databases. But something wasn’t right.

End users started to complain of performance issues on the RDP servers. Application processing had slowed down to a crawl. The employee’s ability to perform their tasks was being impeded. The architect and cloud administrators added more remote desktop servers into the environment and increased their size. Sizing on the application servers, which were just Microsoft Windows Servers in the public cloud, were also increased. This alleviated the problems, albeit temporarily. As more and more users were logging in to the public cloud-based services, performance and availability took a hit.

And then the bill showed up.

It was creeping up slowly to the anticipated $50,000 per month. Unfortunately, as a side effect of the increasing resources, the bill had risen to more than triple the original estimates presented to the board of directors. In the peak of the “crisis,” the bill had surpassed $150,000 per month. This put the C-suite on edge. What was going on with the cloud migration project? How is the bill so high when they were promised a third of what was being spent? It was time for the ABC Co. team to call for an assist.

This is where I entered the scene. I’ll start this next section of the story by stating this outright: I didn’t solve all their problems. I wasn’t a savior on a white horse galloping in to save the day. I did, however, help ABC Co. start to reduce their bill and get cloud spend under control.

One of the steps they implemented before I arrived was to use scripted shutdown of servers during non-work hours. This cut off some of the wasteful spend on machines not being used. We also looked at the actual usage on all servers in the cloud. After running some scans, we found many servers not in use for 30 days or more being left on and piling onto the bill. These servers were promptly shut down, archived, then deleted after a set time. Applications experiencing performance issues were analyzed, and it was determined they could be converted to a cloud native architecture. And those pesky ever-growing remote desktop boxes? Smaller, more cost-effective servers were placed behind a load balancer to automatically boot additional servers should the user count demand it. All these are a few of the steps to reducing the cloud bill. Many things occurred after I had left, but this was a start to send them on the right path.

  So, what can be learned from this story? While credit should be given for the legwork done to develop the strategy, on-premises virtual machines and public cloud-based instances aren’t apples to apples. Workloads behave differently in the cloud. The way resources are consumed has costs behind it; you can’t just add RAM and CPU to a problem server like you can in your data center (nor is it often the correct solution). Many variables go into a cloud migration. If your company is looking at moving to the cloud, be sure to ask the deep questions during the initial planning phase—it may just save hundreds of thousands of dollars.

21 Comments
Level 13

Thanks for the article.  It seems that poor planning and monitoring was the cause of the story.  Mistakes are inevitable, but catching them and correcting them = competence.

MVP
MVP

Thanks for the article.

Level 13

Interesting post. Thanks.  I've seen this both with cloud and on-prem.  Depending on the application, throwing more resources (cpu and/or memory) can actually make the problem *worse* at times.  You have to dig in and find out what the real issues and bottlenecks are before you start modifying things.

MVP
MVP

Just another reason the "cloud" should really be called "The Fog" because bad thing lurk there unseen or unknown.

Since fog is just a ground based cloud it is relevant to the badge of honor or is it pain?

Level 14

Great article...  The need for controlling the environment is equally if not more important when you move to the cloud.

Level 12

I am really curious now. Did the data center architect only crunch numbers to come to an estimate of the monthly cost or did he do any real-world experimentation to see what resources would be needed?

MVP
MVP

I believe there is a "behind the curtain" cost that is not well documented as a result of the server being "on" thus there are cycles being used to maintain the server being powered on even though it is virtual.  Those costs tend to bite people.

Level 12

Thank you for sharing these "lessons learned!"

Level 15

I just started investigating this for our SolarWinds environment.

Level 13

Thanks for the article

Level 9

Great Article. Thank you

Level 12

thanks for the post

MVP
MVP

Interesting, I'm still not convinced that cloud will s\ve any money in the long term.

Level 9

The ABC C-Suite was probably looking at ROI over a shorter period of time and that may have dictated the pace of the migration.

The on-premises' infrastructure evolved over years to meet the business needs, probably has some quirks that couldn't be easily quantified.

Successes were noted during the initial migration phase, and I think moving logical pieces of ABCs infrastructure over a longer time would have been a better approach. Move something up, stabilize it, and see if the intended effect/result is achieved. Then use whatever lessons learned to refine the NEXT migration.

Level 13

Definitely seems like a case by case thing.  We've got a few things in the cloud.  What's worked better for us in most cases is Saas solutions.  The cost savings are often exaggerated though.  A lot of times you're using a lot more than you think you will.

Level 12

Thanks for the article.

Level 11

I like the twilight zone theme, this reads like a helpcoptor sales man - "yup everyone will have one, one day, get yours now."

Level 8

Nice article!

Level 12

Like the write up. Very informative.

Level 11

Nice article, looking forward to more of the series.

Level 8

i hope someone learns from this article, very informative