SQL Sentry Plan Explorer 3.0 Demo Kit
Last month we released the new, all-free, Plan Explorer 3.0, and now seemed the right time to follow that up with the 3rd version of the demo kit, too. Please consider earlier versions of the demo kit deprecated; I won't remove them, but I will add a note to those pages soon.
Now, you might ask, "What is a demo kit?" We came up with this concept back in 2011, after being asked by multiple presenters to help them share Plan Explorer with their own audiences.
One purpose is to serve as a starting point to build your own presentations about Plan Explorer specifically, or even about query tuning in SQL Server in general.
The other purpose I see is to act as a training aid - haven't used Plan Explorer before? Don't feel you've taken advantage of all of the features? The demos here are designed to showcase most of the features, and to demonstrate how we make execution plan problems much more obvious and easier to solve. We hope it makes you more productive with the software.
There are two versions of the kit you can download; one with a sample database, and one without. See details below.
Make sure you have the most recent version of Plan Explorer installed - you can always download it from our web site.
The session and other files can be opened directly in Plan Explorer and, for the most part, used without ever connecting to a database. But I also decided as a part of this update to abandon plans that used various versions of AdventureWorks or our own databases and use the sample database Microsoft is committing to going forward: WideWorldImporters.
I went to great lengths to make a copy of that database that can be restored on any edition and on any version, from SQL Server 2008 Express all the way up to SQL Server 2016. Ripping out features added after 2008 turned out to be a much more complex undertaking than I thought; I'll blog about that separately and update this space with links. Without going into a bunch of detail of what's missing from the full version of the database, just think of it as roughly the same schema and data, but without modern or Enterprise features. I am trying to get it (or something very close) incorporated into the official GitHub repo and, again, will update this space when that happens.
You'll need this backup restored to an instance of SQL Server 2008 or greater, any edition, if you want to run any of the queries interactively. You are free to run these same queries against the official WideWorldImporters database, and they should "work" in all cases, however you will not necessarily see the same plans (for example, you may see ColumnStore operations if you are using 2016 Enterprise, but not if you have a different edition, and hints to block those would fail in lower versions).
Here is a
RESTORE command you can modify for your environment:
RESTORE DATABASE WideWorldImporters_Legacy FROM DISK = N'<location of BAK file here>' WITH REPLACE, RECOVERY, MOVE N'WWI_Legacy_Data' TO N'<location of data files here>\WWI_Legacy.mdf', MOVE N'WWI_Legacy_Log' TO N'<location of log files here>\WWI_Legacy.ldf';
Note that when you run the queries interactively, you'll need to go to Edit > Connection and change
GORDIE\SQL2008 to whatever you use to connect to your instance, as my local development connection details are the ones that are stored with the session file. If you change the name of the database, you'll need to modify that information in the connection dialog as well.
.pesession file can be opened in Plan Explorer, and you can move to the different steps of the demo using the History window (each step in the demo is a "version"). There are comments to help identify which version in the History corresponds to which step in the demo below. Note that I ran through these demos multiple times, so the numbers you see for very precise metrics like duration or CPU might not match exactly between the session file you download and the screen shot I attached to the blog post. Also note that if you generate new actual or estimated plans, you will change the History window too, so make sure to save off a backup of the
.pesession file before you start, or save it as a new
.pesession file after you've opened it.
You can move to any History Version # referenced below simply by clicking on that row in the History window.
This entry isn't used for anything plan- or demo-related, but simply to reset the system to the beginning of your demo - it clears the procedure cache and buffer pool, and changes indexes back to how they started. It uses this procedure (click to show):
CREATE PROCEDURE PlanExplorerDemo.Reinitialize AS BEGIN; SET NOCOUNT ON; CREATE INDEX [FK_Sales_InvoiceLines_InvoiceID] ON [Sales].[InvoiceLines]([InvoiceID] ASC) WITH (DROP_EXISTING = ON) CREATE INDEX [FK_Sales_Invoices_CustomerID] ON [Sales].[Invoices]([CustomerID] ASC) WITH (DROP_EXISTING = ON); DECLARE @db int; SET @db = DB_ID(); DBCC FLUSHPROCINDB(@db) WITH NO_INFOMSGS; DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS; END; GO
Virtually any demo can be used to show the better use of screen real estate, the plan layout controls, the use of color to highlight problems, how we point out discrepancies between estimated and actual row counts, and the ability to view costs by I/O, CPU, or both.
In the first demo, we run a query that attempts a join between
InvoiceLines, intentionally constructed to yield two key lookups and a sort:
DECLARE @CustomerID int = 1046; SELECT il.StockItemID, il.ExtendedPrice, i.CustomerID, i.InvoiceID, 0 - il.Quantity, i.InvoiceDate FROM Sales.InvoiceLines AS il INNER JOIN Sales.Invoices AS i ON il.InvoiceID = i.InvoiceID WHERE i.CustomerID = @CustomerID ORDER BY il.InvoiceID, il.StockItemID;
After running this query (or moving to item #2 in the session file), you can show the information in the statement grid that you won't get from Management Studio by default, such as CPU, Duration, and Reads. You can also show that we highlight the difference between estimated rows and actual rows when they differ by a large enough percent. This discrepancy can be caused by a variety of things, including:
- Out-of-date statistics - usually the most common cause is that the current statistics do not reflect the actual data. You can resolve this using UPDATE STATISTICS (and you can see more information about statistics on the Index Analysis tab, described later).
- No histogram - perhaps there are no stats already and auto-create statistics is disabled, or columns are not in the leading key of the index, the optimizer may need to use density or average distribution information instead. This can lead to very vague, ballpark guesses.
- Sheer complexity - in some cases the optimizer just has too much work to do trying to determine estimates, for example if filtering is occurring against many columns or using many predicates.
Statistics discrepancies are important to note at the statement level, but they are important at the individual operator level as well. We highlight these discrepancies in red on the plan diagram for an actual plan, so you can see exactly where incorrect estimates are happening. These can cascade from one operator throughout an entire plan, causing the optimizer to make sub-optimal decisions.
The grid also includes certain columns dynamically, only when they're relevant (in this case, Key Lookups and Sort Operations). You can show the ability to add other columns at will by right-clicking the column header, choosing Column Chooser, and dragging any column onto the grid. The metrics you see should look something like this in this case:
Moving to the Plan Diagram, you can show that color is used to indicate problem areas. There are two Key Lookups in this plan, but one clearly has an overwhelming estimated cost. You can stress that these costs are estimated and may not reflect reality at all (something we'll see in a different plan below), but they can still be used as a rough guideline on where to focus.
The Plan Tree and Top Operations tabs have some useful information (for example, the Actual Executions column shows part of the reason the highlighted Key Lookup is so expensive). They show essentially the same information; the Plan Tree fixes the grid into a structure like you would see in
SHOWPLAN_TEXT, while Top Operations presents a grid sortable by any column.
If you switch to the Query Columns tab, you can show how we highlight the columns that made the more expensive Key Lookup necessary. This is to provide some evidence that changing the index to either have these columns in the key or include list will probably help the performance of this query.
Note that when I'm talking about this, I always stress the importance of considering both sides of the workload - while making an index wider can help this one specific query, it might not be run enough to justify the change, and the change may not benefit any other queries, either. Most importantly, changing the index can create a lot more work for all of your write queries, too. (I talk about some of the decision factors, which are relevant both for creating new indexes and changing existing indexes, in the post, Don't just blindly create those "missing" indexes!.)
A new feature in Plan Explorer 3.0 is Index Analysis, designed to vastly improve the way you look at queries and consider index improvements. Let's move to that tab and see what we have. The problematic Key Lookup is the one associated with the Index Seek on InvoiceLines, so let's select that operation (Node 15) in the Selected Operation list, and uncheck Other and Indexed from Visible Columns so we can focus on the columns used in this query. You should see something very close to this:
There's a lot of detail here - essentially on the left you have information about the columns, including statistics, estimated size, and whether they are sorted or are included in a predicate.
On the right, you have all of the indexes on the table, starting with the clustered index (if there is one). Next will be the index selected for this specific operation (though sometimes that will be the clustered index). After that, though clipped from the screen shot above, you will see the rest of the indexes on the table, ordered from left to right by what we call "Score." The algorithm for determining an index's score is not public, but you can assume that, for example, a covering index with a leading key column that is used in both a predicate and a sort will score much higher than a non-covering index with a leading key column that is sorted in the wrong direction.
Now, if we look closer at the index that was chosen for this query, we see that it only scored 63%. The reason? It isn't covering. Meaning there are columns missing from the index, causing the key lookup. In the grid, we highlight those for you in salmon, to make it obvious which columns would help make this index better (or to use in a different index, if you wanted to go that way). The functionality we provide here to allow you to fix this, though, is quite a different way to think about index improvement. You can click into any of those missing column cells, make them a part of the index (hypothetically), and see how it improves the score of the index.
Since you know from the left side of the grid (or from the query text) that
StockItemID is both part of the output and part of the
ORDER BY, the first thing you do is change that column so that it is second in the key list. You do that by clicking on the cell and selecting the number 2 (and if the sort order were descending, you'd click the sort icon to change that order). This makes the index score 71%.
InvoiceLineID column says it's involved in a predicate, but that's the XML playing tricks on us (if you expand the predicate, you'll see that it's just validating against itself, as part of the nested loops operation). As the clustering key, it's included in this non-clustered index anyway, so you can move on to the other two columns. If key add those to the include list, by clicking the cell and choosing "Included," we see the score change again - this time to 90%.
Not perfect, and keep in mind it won't always be possible to hit 100%, but certainly something we can try to improve the performance of the query. Click on the script button at the bottom left of that column (
<s>), and it will give you a new dialog with a batch of SQL involving a
DROP INDEX /
CREATE INDEX. You don't have to do anything with this (it will be in the next entry in the History), but mention that you can run this directly here, copy it to a query window to modify before running, or just save it to a text file for later consideration. Click Close.
Before moving on, let's see if you can't improve this query further by removing the other Key Lookup. At the top of the tab, switch the Selected Operation to the other Index Seek (Node 9). You should see something like this:
InvoiceID is shown with a predicate because of the nested loop, not because it's needed for the index, so you can focus on the other column that makes the Key Lookup necessary -
InvoiceDate. This column is clearly required for output but not involved in any sort operations or predicates, so we can change the column to be included in the index, and watch the score jump from 75% to 100%.
You can show the output in the
<s> dialog, but know that both index scripts will be included in History version 3.
If you move to history item #3, you'll see that the indexes have been re-created with the adjustments we scripted out in step 2, and then the same query is run again. This time you see a much better plan, with two Key Lookups removed and, as a side effect of adding a column to the key of one of the indexes, a Sort has been removed as well:
Next we can take a look at the histogram, which can be a useful source of information as well. For the foreign key index on Invoices, the distribution is fairly uniform. This query is slightly different in that we're looking for a range of customers. We'll start with a low of 800 and a high of 830 (yielding about 10,000 rows):
DECLARE @CustomerIDLow int = 800, @CustomerIDHigh int = 830; SELECT il.StockItemID, il.ExtendedPrice, i.CustomerID, i.InvoiceID, 0 - il.Quantity, i.InvoiceDate FROM Sales.InvoiceLines AS il INNER JOIN Sales.Invoices AS i ON il.InvoiceID = i.InvoiceID WHERE i.CustomerID >= @CustomerIDLow AND i.CustomerID <= @CustomerIDHigh ORDER BY il.InvoiceID, il.StockItemID;
Look at the Index Analysis tab and make sure that the Index Seek is selected - in this case there is only a seek against
Invoices, while a scan was chosen for
InvoiceLines. You will see that there is a histogram below, with a rather uniform and uninteresting distribution, with a range of values highlighted - this represents the range of the runtime parameters (allowing you to spot parameter sniffing problems, say, where a scan is chosen instead of a seek or vice versa):
In the Parameters tab next to the Histogram, we can also test other values. For example, you can put 176 as high and low end of the range, click Get Est Plan, and you will end up with a slightly different looking histogram (as shown in History version 5):
And if you repeat that process with values outside of the range, say 1100 and 1120 (to simulate an ascending key problem), you can see we paste a big red warning label there, with a tooltip that describes the problem (this is visible under History version 6):
This is a rather lengthy query adapted from one of the views in the sample database. It demonstrates how you can visualize the tables and relationships involved in a query by moving to the Join Diagram tab. This can be quite useful for detangling and understanding queries involving views, especially when they are nested, as we show the underlying relationships between the base tables:
It also shows our Missing Indexes feature, which you can access by right-clicking the SELECT node on the Plan Diagram (and I'll repeat the warning here, don't just create every missing index recommended to you!):
And finally, the Table I/O tab shows how we organize
SET STATISTICS IO ON; data for you in a sortable grid.
This has a complex query that runs long enough to show off our Live Query Capture, replay, and actual plan recosting features. Note that it can only be run interactively if you restore the database (or have an existing WideWorldImporters database) on an instance of SQL Server 2014 SP1 or newer, and if you enable "With Live Query Profile" under Get Actual Plan, but you can still show all of the details - even without connecting to a database.
If you open this history version, and move to the Plan Diagram tab, you will see it has some new "stuff" - a chart showing the progression of CPU, I/O, and Waits used throughout the execution of the statement, and a little set of play controls in the middle. The play controls let you replay the execution of the query - without executing against the server - to help you pinpoint exactly when spikes in any resource occurred (in this case, you'll see that they correlate to the activity from
What's also interesting about this plan is that you can show how different estimated costs and actual costs can be - in many cases the estimated costs have no relevance whatsoever to what will happen when the query is executed. If you turn on "Show Estimated Plan" on the toolbar (or right-click the plan and select "Estimated Costs"), you see that SQL Server guessed that the Sort would be the most expensive operation in this plan. If you switch back to Actual, because we've collected per-operator resource usage, we can actually tell you very accurately what the actual costs were.
And since most SQL Server deployments are I/O-bound, we find it very useful to display the costs by I/O, which is not possible in Management Studio. If you right-click the plan again and choose
Costs By > I/O, and then
Line Widths By > Data Size (MB), you'll see that almost all of the I/O cost of this query - while still estimated to be in the Sort - is actually in the Table Spool related to
Again, the actual costs involved in the plan you have (especially if you run it again) are unlikely to match this graphic exactly, but you can see as you switch between these three views the difference between how SQL Server estimated the costs would be distributed, how we observed them being distributed in practice, and how the costs stacked up when we took a focus on I/O:
I created two stored procedures in the sample database to make it easy to show how Plan Explorer can provide a lot of insight about a deadlock. In the screen shot below you can see that we include a grid with information such as isolation level, the statements that deadlocked, and even the procedures they came from - without having to reverse engineer things like ObjectIDs and HobtIDs. Below that there is a deadlock graph with a more intuitive layout than the one you get by default - again no reverse engineering of object names, and the flow clearly indicates the order in which locks were taken - which can help you solve the deadlock quicker.
In this case, the problem is simple - the two stored procedures named in the grid have batches that update two tables in the opposite order. If you wrap those procedure calls in outer transactions, SQL Server has no alternative - it needs to make one batch the victim and let the other one succeed.
This is just scratching the surface, but there are multiple other things you can show to see how much quicker it is to analyze plans within Plan Explorer, from just running queries to get runtime metrics more conveniently, comparing plans before and after changes using the History feature, spotting residual I/O, seeing the impact of updates on non-clustered indexes, and many other things. This kit is meant as a starting point, but you are more than welcome to generate plans from your own queries and make your demos your own.
If you have any questions or comments about the demo kit, or Plan Explorer in general, or anything really, feel free to hit us up.