Geek Speak

5 Posts authored by: explorevm

Submitted for your approval; a story of cloud horrors.

One of performance issues impacting production.

Where monthly cloud billing began spiraling out of control.

 

The following story is true. The names have been changed to protect the innocent.

 

During my consulting career, I’ve encountered companies at many different stages of their cloud journey. What was particularly fun about walking into this shop is they were already about 75% up into public cloud. The remaining 25% was working towards being migrated off their aging hardware. They seemed to be ahead of the game, so why were my services needed?

 

Let’s set up some info about the company, which I’ll call “ABC Co.” ABC Co. provides medical staff and medical management to many hospitals and clinics, with approximately 1,000 employees and contractors spread across many states. Being in both medical staffing and recordkeeping, ABC Co. was subject to many compliance regulations such as HIPAA, PCI, etc. Their on-premises data center was on older hardware nearing end of life, and given the size of their IT staff, they decided to move out of the data center business.

 

The data center architect at ABC Co. did his homework. He spent many hours learning about public cloud, crunching numbers, and comparing virtual machine configurations to cloud-based compute sizing. Additionally, due to compliance requirements, ABC Co. needed to use dedicated hosts in the public cloud. After factoring in all the sizing, storage capacity, and necessary networking, the architect arrived at an expected monthly spend number: $50,000. He took this number to the board of directors with a migration plan and outlined the benefits of going to the cloud versus refreshing their current physical infrastructure. The board was convinced and gave the green light to move into the public cloud.

 

Everything was moving along perfectly early in the project. The underlying cloud architecture of networking, identity and access management, and security were deployed. A few workloads were moved up into the cloud to great success. ABC Co. continued their migration, putting applications and remote desktop servers in the cloud, along with basic workloads such as email servers and databases. But something wasn’t right.

 

End users started to complain of performance issues on the RDP servers. Application processing had slowed down to a crawl. The employee’s ability to perform their tasks was being impeded. The architect and cloud administrators added more remote desktop servers into the environment and increased their size. Sizing on the application servers, which were just Microsoft Windows Servers in the public cloud, were also increased. This alleviated the problems, albeit temporarily. As more and more users were logging in to the public cloud-based services, performance and availability took a hit.

 

And then the bill showed up.

 

It was creeping up slowly to the anticipated $50,000 per month. Unfortunately, as a side effect of the increasing resources, the bill had risen to more than triple the original estimates presented to the board of directors. In the peak of the “crisis,” the bill had surpassed $150,000 per month. This put the C-suite on edge. What was going on with the cloud migration project? How is the bill so high when they were promised a third of what was being spent? It was time for the ABC Co. team to call for an assist.

 

This is where I entered the scene. I’ll start this next section of the story by stating this outright: I didn’t solve all their problems. I wasn’t a savior on a white horse galloping in to save the day. I did, however, help ABC Co. start to reduce their bill and get cloud spend under control.

 

One of the steps they implemented before I arrived was to use scripted shutdown of servers during non-work hours. This cut off some of the wasteful spend on machines not being used. We also looked at the actual usage on all servers in the cloud. After running some scans, we found many servers not in use for 30 days or more being left on and piling onto the bill. These servers were promptly shut down, archived, then deleted after a set time. Applications experiencing performance issues were analyzed, and it was determined they could be converted to a cloud native architecture. And those pesky ever-growing remote desktop boxes? Smaller, more cost-effective servers were placed behind a load balancer to automatically boot additional servers should the user count demand it. All these are a few of the steps to reducing the cloud bill. Many things occurred after I had left, but this was a start to send them on the right path.

 

  So, what can be learned from this story? While credit should be given for the legwork done to develop the strategy, on-premises virtual machines and public cloud-based instances aren’t apples to apples. Workloads behave differently in the cloud. The way resources are consumed has costs behind it; you can’t just add RAM and CPU to a problem server like you can in your data center (nor is it often the correct solution). Many variables go into a cloud migration. If your company is looking at moving to the cloud, be sure to ask the deep questions during the initial planning phase—it may just save hundreds of thousands of dollars.

Two roads diverged in a yellow wood,

And sorry I could not travel both

-Robert Frost

 

At this point in our “Battle of the Clouds” journey, we’ve seen what the landscape of the various clouds looks like, cut through some of the fog around cloud, and glimpsed what failing to plan can do to your cloud migration. So, where does that leave us? Now it’s time to lay the groundwork for the data center’s future. Beginning this planning and assessment phase can seem daunting, so in this post, we’ll lay out some basic guidelines and questions to help build your roadmap.

 

First off, let’s start at what’s already in the business’s data center.

 

The Current State of Applications and Infrastructure

 

When looking forward, you must always look to see where you’ve been. By understanding the previous decisions, you can gain an understanding of the business’ thinking, see where mistakes may have been made, and work to correct them in the newest iteration of the data center. Inventory everything in the data center, both hardware and software. You’d be surprised what may play a critical role in prevention of migration to new hardware or a cloud. Look at the applications in use not only by the IT department, but also the business, as their implementation will be key to a successful migration.

 

  • How much time is left on support for the current hardware platforms?
    • This helps determine how much time is available before the execution of the plan has to be done
  • What vendors are currently in use in the data center?
    • Look at storage, virtualization, compute, networking, and security
    • Many existing partners already have components to ease the migration to public cloud
  • What applications are in use by IT?
    • Automation tools, monitoring tools, config management tools, ticketing systems, etc.
  • What applications are in use by the business?
    • Databases, customer relationship management (CRM), business intelligence (BI), enterprise resource planning (ERP), call center software, and so on

 

What Are the Future-State Goals of the Business?

 

As much as most of us want to hide in our data centers and play with the nerd knobs all day, we’re still part of a business. Realistically, our end goal is to deliver consistent and reliable operations to keep the business successful. Without a successful business, it doesn’t matter how cool the latest technology you installed is, its capabilities, or how many IOPs it can process—you won’t have a job. Planning out the future of the data center has to line up with the future of the company. It’s a harsh reality we live in. But it doesn’t mean you’re stuck in your decision making. Use this opportunity to make the best choices on platforms and services based on the collective vision of the company.

 

  • Does the company run on a CapEx or OpEx operating model, or a mixture?
    • This helps guide decisions around applications and services
  • What regulations and compliances need to be considered?
    • Regulations such as HIPAA
  • Is the company attempting to “get out of the data center business?”
    • Why does the C-suite think this, and should it be the case?
  • Is there heavy demand for changes in the operations of IT and its interaction with end users?
    • This could lead to more self-service interactions for end users and more automation by admins
  • How fast does the company need to react and evolve to changes in the environment?
    • DevOps and CI/CD can come into effect
    • Will applications need to be spun up and down quickly?
  • Of the applications inventoried in the current state planning, how many could be moved to a SaaS product?
    • Whether moving to a public cloud or simply staying put, the ability to reduce the total application footprint can affect costs and sizing.
    • This can also call back to the OpEx or CapEx question from earlier

Using What You’ve Collected

 

All the information is collected and it’s time to start building the blueprint, right? Well, not quite. One final step in the planning journey should be a cloud readiness assessment. Many value-added resellers, managed services providers, and public cloud providers can help the business in this step. This step will collect deep technical data about the data center and applications, map all dependencies, and provide an outline of what it’d look like to move them to a public cloud. This information is crucial as well as it lays out what can easily be moved as well as what applications would need to be refactored or completely rebuilt. The data can be applied to a business impact analysis as well, which will give guidance on what these changes can do to the business’s ability to execute.

 

This seems like a lot of work. A lot of planning and effort into deciding to go to the public cloud or to stay put. To stick to “what works.” Honestly, may companies look at the work and decide to stay on-premises. Some choose to forgo the planning and have their cloud projects fail. I can’t tell you what to do in your business’s setting—you have to make the choices based on your needs. All I can do is offer up advice and hope it helps.

 

https://www.poetryfoundation.org/poems/44272/the-road-not-taken

Scenario: a mission-critical application is having performance issues during peak business hours. App developers blame the storage. The storage team blames the network. The network admin blames the infrastructure. The cycle of blame continues until finally someone shouts, “Why don’t we just put it in the cloud?” Certainly, putting the application into the public cloud will solve all these issues, right? Right?! While this might sound like a tempting solution, just simply installing an application on server in the public cloud may not resolve the problem—it might open the company to more unforeseen issues.

 

Failure to Plan Is Planning to Fail

 

The above adage is one of the biggest roadblocks to successful cloud migrations. Often when an application is looked at to be moved to the cloud, the scope of its interactions with servers, networks, and databases isn’t fully understood. What appears to be a Windows Server 2016 box with four vCPU and 16Gb RAM running an application turns out to be an interconnected series of SQL Server instances, Apache Web Servers, load balancers, application servers, and underlying data storage. If this configuration is giving your team performance issues on your on-premises hardware, why would moving it to hardware in a different data center resolve the problem?

 

If moving to the cloud is a viable option at this juncture of your IT strategy, it’s also time to consider refactoring the application into a more cloud-native format. What is cloud-native? Per the Cloud Native Computing Foundation (CNCF), the definition of cloud-native is:

 

“(Cloud-native) technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.

 

These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil.”

 

Cloud-native applications have been developed or refactored to use heavy automation, use containers for application execution, are freed from operating system dependencies, and present elastic scalability traditional persistent virtual servers cannot provide. Applications become efficient not only in performance, but in cost as well with this model. However, refactoring an application to a cloud-native state can take lots of time and money to make the transition.

 

The Risks of Shadow IT

 

If you’ve taken the time to understand the application dependencies, a traditional application architecture can be placed in a public cloud while an app is refactored to help alleviate some issues. But again, the process can be time-consuming. Administrators can grow impatient during these periods, or if their request for additional resources have been denied, can grow frustrated. The beautiful thing about public clouds is the relative ease of entry into services. Any Joe Developer with a credit card can fire up an AWS or Azure account on their own and have a server up and running within a matter of minutes by following a wizard.

 

Cool, my application is in the cloud and I don’t have to wait for the infrastructure teams to figure out the issues. Problem solved!

 

Until an audit finds customers’ credit card data in an AWS S3 bucket open to the public. Or when the source of a ransomware outbreak is traced back to an unsecured Azure server linked to your internal networks. Oh, and let’s not even discuss the fact an employee is paying for these services outside of the purview of the accounting department or departmental budgets (which is a topic for another blog post later in this series).

 

Security and compliance can be achieved in the cloud, but much like before, it comes down to planning. By default, many public cloud services aren’t locked down to corporate or compliance standards. Unfortunately, this information isn’t widely known or advertised by the cloud vendors. It’s on the tenant to make sure their deployments are secure and the data is backed up. Proper cloud migration planning involves all teams of the business’s IT department, including the security team. Everyone should work together to make sure the cloud architecture is designed in a way allowing for performance, scalability, and keeping all data secure.

 

Throwing an application at the cloud isn’t a fix for poor architecture or aging technologies. It can be a valuable tool to help in the changing world of IT, but without proper planning it can burn your company in the end. In the next post in the “Battle of the Clouds” series, we’ll look at determining the best location for the workload and how to plan for these moves.

The reports of my death are greatly exaggerated - The On Premises Data Center Mark Twain

 

As I mentioned in my previous post, the hype cycle has the on-premises data center dead to rights. Public cloud is the future and there’s no turning back. When you step back and investigate the private cloud space, you’d be surprised.

 

When the order comes down to move to the public cloud, sometimes you discover the application or workload isn’t suitable for migration. How many times have you been in an environment where an important line-of-business application was written by an employee who left 15 years ago and has no documentation. Is the application even running on an operating system one of the public cloud providers supports? While proper planning and refactoring can provide a path to move your application to the cloud, occasionally you’ll run into unavoidable physical limitations. In manufacturing environments, often you’ll find the need for a custom interface for a machine or a USB license dongle for an application to function.

 

Sometimes applications can’t move to the cloud despite planning. But not everyone takes the time to plan their cloud migrations. Failure to plan leads to many issues with cloud adoption (which I’ll discuss in my next post). What about your connectivity to the internet and cloud? Whether it’s uptime or bandwidth, the network can prove to be a cloud roadblock. When these migrations fail, where does the workload go? Back to the on-premises data center!

 

Looking at purchasing new data center hardware or moving to the cloud often includes decision-makers outside of the IT department. While we will leave the deep fiscal details for the financial experts, Chief Financial Officers and business leaders often weigh the benefits of the Operating Expenditure versus Capital Expenditures, commonly referred to as OpEx vs. CapEx. While the ability to quickly scale your cloud resources up and down based on demand might be a blessing to an application administrator, the variations on cost accompanying it can prove to be difficult for the accounting department. The ability to make a one-time purchase every three to five years and amortize the cost over those years is a tried-and-true financial tactic.

 

Speaking of finances, the fiscal performance of hardware vendors must certainly be a bellwether of cloud adoption. Sales of data center hardware has to be falling with the ever-growing adoption of public cloud, right? Wrong. As has been announced over the past year, vendors such as Dell Technologies, Pure Storage, and Lenovo report record earnings and growth. In May 2019, Dell Technologies announced 2% growth year over year. May also brought a revenue announcement from Lenovo of 12.5% growth year over year. August of 2019 saw Pure Storage announce a whopping 28% year over year revenue growth. These companies are just a small example. Clearly physical data center hardware is still in high demand.

 

Looking at many factors, it’s easy to say the on-premises data center is far from dead. Continuing the “Battle of the Clouds” series, we’ll dive into why “just put it in the cloud” isn’t always the best solution. 

 

 

We’ve all heard it before.

 

“The cloud is the future!”

“We need to move to the cloud!”

“The on-premises data center is dead.”

 

If you believe the analysts and marketing departments, public cloud is the greatest thing to happen to the data center since virtualization. But, is it true? Could public cloud be the savior of the IT department? While many features to the public cloud make it an attractive infrastructure replacement, failure to adequately plan for its use can prove to be a costly mistake.

 

Moving past the marketing, the cloud is simply “someone else’s computer.” Yes, it’s more complicated than that, but when you boil it down to the basics, it’s a data center maintained by a third-party with proprietary software on top to provide an easy-to-use dashboard for provisioning and monitoring. When you move to the cloud, you’re still running an application on a server. Many of the same problems you have with your application running on-premises can persist in the cloud.

 

In a public cloud environment, the added complexity of multi-tenancy on the underlying resources can complicate things. Now you have to think about regulatory compliance? And after all, public cloud is still a data center subject to human error. This has been made evident over and over, famously by the Amazon Web Services S3 outage of February 2017.* The wide adoption of public clouds such as AWS and Microsoft Azure has also opened the door to more instances of shadow IT. Rogue devs, admins, and end users who either don’t have the patience to wait or have been denied resources opening cloud accounts with their own credit cards and putting corporate data at risk. And, we have yet to even take into consideration the consumption-based billing model.

 

Even with the above listed “issues” (I put quotes around issues as some of the problems can be encountered in the private cloud or worked around), public cloud can be an awesome tool in the IT administrator’s toolbox. Properly architected cloud-based applications can alleviate performance issues and can be developed with robust redundancies to avoid downtime. The ability to quickly scale compute up and down based on demand provides the business amazing agility not before seen in the standard data center procurement cycle. And, the growing world of SaaS products provides an easy gateway to enter the cloud (yes, I’m going to take the stance that as-a-Service qualifies as cloud). The introduction of cloud technologies has also opened a world of new application deployment models such as microservices and serverless computing. These amazing ways of looking at infrastructure weren’t possible until recently.

 

Is there hype around public cloud? For sure! Is some of it warranted? Absolutely! Is it the be-all and end-all technology of the future? Not so fast. In the upcoming series of posts I’m calling “Battle of the Clouds,” we’ll look at public cloud versus private cloud, going past the hype to dive into the state of on-premises data centers, what it takes for a successful cloud implementation, and workload planning around both solutions. I look forward to hearing your opinions on this topic as well!

 

*Summary of the Amazon S3 Service Disruption in the Northern Virginia (US-EAST-1) Region

Filter Blog

By date: By tag:

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website, you consent to our use of cookies. For more information on cookies, see our cookie policy.