I’ve got a quick quiz for you today. Which scenario is worse?
SQL Statement 1 executes 1,000 times, making end-users wait 10 minutes. 99% of the wait time for SQL Statement 1 is “PAGEIOLATCH_EX.”
SQL statement 2 executes one time, also making the end-users wait 10 minutes. 99% of the wait time for SQL Statement 2 is “LCK_M_X.”
The answer is that both are equally bad, because they made another user wait 10 minutes. It doesn’t matter to the end-user if the root cause is disk, memory, CPU, network, or locking/blocking. They only care that they have to wait ten minutes.
The end-users will pressure you to tune the queries to make it faster. They want speed. You will then, in turn, try to tune these queries to reduce their run duration. Speed and time become the measuring sticks for success.
Many data professionals put their focus on run duration. But by focusing only on run duration, they overlook the concept of throughput.
Time for another quiz: Which scenario is better?
You can tune SQL Statement 3 to execute 1,000 times, and run for 30 seconds.
You can tune SQL Statement 4 to execute 10,000 times, and run for 35 seconds.
The extra five seconds of wait time is a tradeoff for being able to handle 10x the load. I know I’d rather have 10,000 happy users than 9,000 unhappy ones. The trouble here is that being able to tune for throughput can take a lot more effort than tuning for duration alone. It’s more important to get up and running than it is to design for efficiency.
Once upon a time, we designed systems to be efficient. This, of course, led to a tremendous number of billable hours as we updated systems to avoid the Y2K apocalypse. But today, efficiency seems to be a lost art. Throwing hardware at the problem gets easier with each passing day. For cloud-first systems, efficiency is an afterthought because cloud makes it easy to scale up and down as needed.
Building and maintaining efficient applications that focus on speed as well as throughput requires a lot of discipline. Here’s a list of three things you can do, starting today, for new and existing database queries.
Examine current logical I/O utilization
To write queries for scale, focus on logical I/O. The more logical I/O needed to satisfy a request, the longer it takes to run and the less throughput you will have available. One of the largest culprits for extra logical I/O is the use of incorrect datatypes. I put together a couple of scripts a while back to help you with this. One script will look inside a SQL Server® database for integer values that may need to have their datatypes adjusted. Or you can run this script to check the datatypes currently residing in memory, as those datatypes are likely to be the ones you should focus on adjusting first. In either case, you are being proactive in the measuring and monitoring of the data matching the defined datatype.
Take a good look at what is in your pipe
After you have spent time optimizing your queries for logical I/O, the last thing you want is to find that you have little bandwidth available for data traffic because everyone in the office is playing Pokemon®. We started this post talking about speed, then throughput, and now we are talking about capacity. You need to know what your maximum capacity is for all traffic, how much of that traffic is dedicated to your database queries, and how fast those queries are returning data to the end-users.
Test for scalability
You can use tools such as HammerDB and Visual Studio to create load tests. After you have optimized your query, see how it will run when executed by simultaneous users. I like to take a typical workload and try running 10x, 25x, 50x, and 100x test loads and see where the bottlenecks happen. It is important to understand that your testing will not likely simulate production network bandwidth, so keep that metric in mind during your tests. You don’t want to get your code to be 100x capable only to find you don’t have the bandwidth.
When it comes to performance tuning your database queries, your initial focus will be on speed. In addition to speed, it is necessary to consider throughput and capacity as important success metrics for tuning your queries. By focusing on logical I/O, testing for scalability, and measuring your network bandwidth, you will be able to maximize your resources.