Our Cloud Bill Is WHAT?! - Battle of the Clouds Series

Submitted for your approval; a story of cloud horrors.

One of performance issues impacting production.

Where monthly cloud billing began spiraling out of control.

The following story is true. The names have been changed to protect the innocent.

During my consulting career, I’ve encountered companies at many different stages of their cloud journey. What was particularly fun about walking into this shop is they were already about 75% up into public cloud. The remaining 25% was working towards being migrated off their aging hardware. They seemed to be ahead of the game, so why were my services needed?

Let’s set up some info about the company, which I’ll call “ABC Co.” ABC Co. provides medical staff and medical management to many hospitals and clinics, with approximately 1,000 employees and contractors spread across many states. Being in both medical staffing and recordkeeping, ABC Co. was subject to many compliance regulations such as HIPAA, PCI, etc. Their on-premises data center was on older hardware nearing end of life, and given the size of their IT staff, they decided to move out of the data center business.

The data center architect at ABC Co. did his homework. He spent many hours learning about public cloud, crunching numbers, and comparing virtual machine configurations to cloud-based compute sizing. Additionally, due to compliance requirements, ABC Co. needed to use dedicated hosts in the public cloud. After factoring in all the sizing, storage capacity, and necessary networking, the architect arrived at an expected monthly spend number: $50,000. He took this number to the board of directors with a migration plan and outlined the benefits of going to the cloud versus refreshing their current physical infrastructure. The board was convinced and gave the green light to move into the public cloud.

Everything was moving along perfectly early in the project. The underlying cloud architecture of networking, identity and access management, and security were deployed. A few workloads were moved up into the cloud to great success. ABC Co. continued their migration, putting applications and remote desktop servers in the cloud, along with basic workloads such as email servers and databases. But something wasn’t right.

End users started to complain of performance issues on the RDP servers. Application processing had slowed down to a crawl. The employee’s ability to perform their tasks was being impeded. The architect and cloud administrators added more remote desktop servers into the environment and increased their size. Sizing on the application servers, which were just Microsoft Windows Servers in the public cloud, were also increased. This alleviated the problems, albeit temporarily. As more and more users were logging in to the public cloud-based services, performance and availability took a hit.

And then the bill showed up.

It was creeping up slowly to the anticipated $50,000 per month. Unfortunately, as a side effect of the increasing resources, the bill had risen to more than triple the original estimates presented to the board of directors. In the peak of the “crisis,” the bill had surpassed $150,000 per month. This put the C-suite on edge. What was going on with the cloud migration project? How is the bill so high when they were promised a third of what was being spent? It was time for the ABC Co. team to call for an assist.

This is where I entered the scene. I’ll start this next section of the story by stating this outright: I didn’t solve all their problems. I wasn’t a savior on a white horse galloping in to save the day. I did, however, help ABC Co. start to reduce their bill and get cloud spend under control.

One of the steps they implemented before I arrived was to use scripted shutdown of servers during non-work hours. This cut off some of the wasteful spend on machines not being used. We also looked at the actual usage on all servers in the cloud. After running some scans, we found many servers not in use for 30 days or more being left on and piling onto the bill. These servers were promptly shut down, archived, then deleted after a set time. Applications experiencing performance issues were analyzed, and it was determined they could be converted to a cloud native architecture. And those pesky ever-growing remote desktop boxes? Smaller, more cost-effective servers were placed behind a load balancer to automatically boot additional servers should the user count demand it. All these are a few of the steps to reducing the cloud bill. Many things occurred after I had left, but this was a start to send them on the right path.

  So, what can be learned from this story? While credit should be given for the legwork done to develop the strategy, on-premises virtual machines and public cloud-based instances aren’t apples to apples. Workloads behave differently in the cloud. The way resources are consumed has costs behind it; you can’t just add RAM and CPU to a problem server like you can in your data center (nor is it often the correct solution). Many variables go into a cloud migration. If your company is looking at moving to the cloud, be sure to ask the deep questions during the initial planning phase—it may just save hundreds of thousands of dollars.