Last week was Microsoft Ignite in Atlanta. I had the privilege of giving two presentations, one of which was titled "Performance Tuning Essentials for the Cloud DBA." I was thinking of sharing the slides, but the slides are just there to enhance the story I was telling. So I've decided instead to share the narrative here in this post today, including the relevant images. As always, you're welcome.
I started with two images from the RightScale 2016 State of the Cloud Report:
The results of that survey help to show that hybrid IT is real, it's here, and it is growing. Using that information, combined with the rapid advances we see in the technology field with each passing year, I pointed out how we won't recognize IT departments in five years.
For a DBA today, and also the DBA in five years, it shouldn't matter where the data resides. The data can be either down the hall or in the cloud. That's the hybrid part, noted already. But how does one become a DBA? Many of us start out as accidental DBAs, or accidental whatevers, and in five years there will be accidental cloud DBAs. And those accidental cloud DBAs will need help. Overwhelmed at first, the cloud DBA will soon learn to focus on their core mission:
Once the cloud DBA learns to focus on his or her core mission (recovery), they can start learning how to do performance troubleshooting (because bacon ain't free). I believe that when it comes to troubleshooting, it is best to think in buckets. For example, if you are troubleshooting a virtualized database server workload, the first question you should be asking yourself is, "Is the issue inside the database engine or is it external, possibly within the virtual environment?" In time, the cloud DBA learns to think about all kinds of buckets: virtual layers, memory, CPU, disk, network, locking, and blocking. Existing DBAs already have these skills. But as we transition to being cloud DBAs, we must acknowledge that there is a gap in our knowledge and experience.
That gap is the network.
Most DBAs have little to no knowledge of how networks work, or how network traffic is utilized. A database engine, such as SQL Server, has little knowledge of any network activity. There is no DMV to expose such details, and a DBA would need to collect O/S level details on all the machines involved. That's not something a DBA currently does; we take networks for granted. To a DBA, networks are like the plumbing in your house. It's there, and it works, and sometimes it gets clogged.
But the cloud demands that you understand networks. Once you go cloud, you become dependent upon networks working perfectly, all the time. One little disruption, because someone didn't call 1-800-DIG-SAFE before carving out some earth in front of your office building, and you are in trouble. And it's more than just the outage that may happen from time to time. No. You need to know about your network as a cloud DBA for the following reasons: RPO, RTO, SLA, and MTT. I've talked before about RPO ands RTO here, and I think anyone reading this would know what SLA means. MTTI might be unfamiliar, though. I borrowed that from adatole. It stands for Mean Time To Innocence, and it is something you want to keep as short as possible, no matter where your data resides.
You may have your RPO and RTO well-defined right now, but do you know if you can meet those metrics right now? Turns out the internet is a complicated place:
Given all that complexity, it is possible that data recovery may take a bit longer than expected. When you are a cloud DBA, the network is a HUGE part of your recovery process. The network becomes THE bottleneck that you must focus on first and foremost in any situation. In fact, when you go cloud, the network becomes the first bucket you need to consider. The cloud DBA will need to be able to know and understand in five minutes or less if the network is the issue first, before spending any time on trying to tune a query. And that means the cloud DBA is going to have to understand what is clogging that pipe:
Because when your phone rings, and the users are yelling at you that the system is slow, you will want to know that the bulk of the traffic in that pipe is Pokemon Go, and not the data traffic you were expecting.
Here's a quick list of tips and tricks to follow as a cloud DBA.
- Use the Azure Express! Azure Express Route is a dedicated link to Azure, and you can get it from Microsoft or a managed service provider that partners with Microsoft. It's a great way to reduce the complex web known as the internet, and give you better throughput. Yes, it costs extra, but only because it is worth the price.
- Consider Alt-RPO, Alt-RTO. For those times when your preferred RPO and RTO needs won't work, you will want an alternative. For example, you have an RPO of 15 minutes, and an RTO of five minutes. But the network is down, so you have an Alt-RPO of an hour and an Alt-RTO of 30 minutes, and you are storing backups locally instead of in the cloud. The business would rather be back online, even to the last hour, as opposed to waiting for the original RPO/RTO to be met.
- Use the right tools. DBAs have no idea about networks because they don't have any tools to get them the details they need. That's where a company like SolarWinds comes in to be the plumber and help you unclog those pipes.
Thanks to everyone that attended the session last week, and especially to those that followed me back to the booth to talk data and databases.