T-SQL Tuesday: Cooking at Scale
This month’s T-SQL Tuesday is hosted by Rob Volk (b|t), who has asked us to talk about data using analogies. If you’ve ever sat through one of my presentations, you’ll know that I love using analogies to explain things. So, here’s an analogy that I haven’t used before!
Let’s pretend that I am making a simple spaghetti dinner for 2 people. This is a relatively simple, manageable task that I can complete fairly quickly. Simple recipes are abundant, and I can have dinner ready in less than 30 minutes.
What if I was hosting a small dinner party instead? I have to prepare the same exact spaghetti dinner, but now for a total of 8 people instead of 2. I can probably just scale up the amount of ingredients needed and still feed 8 people with the same recipe. It will probably require a bit more effort and cooking time, but I could likely get it done in under an hour.
Let’s take it to the next level and say that whenever we’re able to host SQL Saturday Chicago again, I’m so enthusiastically grateful that I decide to make spaghetti for the Speaker Dinner! My local SQL Saturday’s Speaker Dinner attendance ranges from 50 to 60 hungry members of the #SQLFamily that I would need to plan to feed! It would not be realistic for me to try and follow that same spaghetti dinner recipe at this scale. It would be best to take a different approach, like a caterer might, to make spaghetti for this large crowd.
But what if I cannot change anything? What if I am somehow forced to still follow that original recipe step by step? And what if I am constrained to using my home kitchen, 4-burner stove, 1-2 stock pots that I own, and I am the only person who can do the work? Am I going to be able to make 50 to 60 meals in an hour or less?
Now, let’s apply all of the above to SQL Server. This analogy is all about helping you to understand scaling of T-SQL queries.
- A T-SQL query is akin to asking for a spaghetti dinner.
- The execution plan created by the query optimizer is akin to the recipe that the cook follows.
- The volume of data that is parsed to return the final result set (logical reads) is akin to the number of people that must be fed.
If you get an execution plan that’s suitable for a small volume of data, and then reuse that plan for a much larger volume of data, that execution plan will most likely perform very poorly.
On the flip side, if you have an execution plan that was created for a large volume of data, and then reuse that plan for a small volume of data, that execution plan will most likely be fine but overkill.
To properly scale, you have to change up something, like the recipe you use or even the overall request, like a baked pasta casserole that can be made much more easily in great volume.
I hope this analogy resonated with you. The next time you hear “Well this query ran fine in development…” (which only has a tiny fraction of the data that production has), hopefully this analogy will come to mind and you can share it to help others understand why T-SQL queries often don’t scale without some adjustments.