SQL Server Query Performance Tuning - March 21, 2013

Question

Below are answers to questions asked during the recent webinar. If you have additional questions, please contact Thomas LaRock at thomaslarock@confio.com Q: What default Sql Server parameters would you change to improve performance? A: That really depends on the specific resource bottleneck. For example, if you are bound by disk I/O then there is no parameter option in SQL Server for you to change in order to improve performance. If you are seeing memory pressure than it may be possible to adjust some parameters in order to improve performance. Using tools like Ignite8 and IgniteVM really help DBAs to answer two questions: (1) Is this a problem? And (2) What actions can I take? Q: Is DISTINCT better or worse than GROUP BY? A: In most cases the performance is going to be nearly identical. I’d encourage you to test both ways to verify performance for your particular scenario. The GROUP BY command uses a DISTINCT under the covers, you can see this by examining the execution plans when comparing the two statements during your testing. The rule of thumb I use is this: use DISTINCT whenever I want to eliminate duplicate rows from my result set and use GROUP BY whenever I am applying an aggregate (SUM, AVG, etc.). Q: Can Tim and Tom speak to implicit conversions and functions on columns? A: We touched upon scalar functions in the webinar. Implicit conversions are likely to result in the optimizer needing to do a table scan and not take advantage of an existing index. I somehow read about downsides using BETWEEN. What's a better way to use “>=” and “<=” ? - Franklin Cid Franklin, the BETWEEN syntax is equivalent to using “>=” and “<=”, there is no performance impact. You can verify this by examining the execution plans. It is worth noting that if you don’t want to include the endpoints in your range then you must use “>” and “<” explicitly. Q: What are the best practices for fine-tuning distributed queries using Linked Servers? All data sources are SQL Server instances some with incompatible collations. A: The best practice is to make certain that as much work as possible is done on the remote server. Otherwise you run the risk of SQL Server processing one row at a time using the API Server Cursor that is embedded in the SQLOLEDB native provider. You can avoid this row-by-agonizing-row (RBAR) processing with the use of the OPENQUERY function. Here’s an article that helps to explain the difference: http://www.sql-server-performance.com/2006/api-server-cursors/ Another point to mention about the use of linked servers. The login you use to connect to the remote server needs to have at least DDL_Admin rights otherwise statistics are not used for the remote query. Q: Please go over determining the selectivity number. What is the formula? A: The formula is to take the number of rows needed from one table divded by the total number of rows for the entire table. So, if I need 18,000 rows from a 19,000 row table, then it becomes: (rows needed)/(total rows in table) = 18,000/19,000 = .94736 The idea is that the closer you are to zero then the more selective your logical set is, and that becomes a focal point for when building (or tuning) a query. Q: What about in a where cause “and exists (select * from People where ID = 44)” ? A: The EXISTS keyword will return either a TRUE or a FALSE. So if you have a WHERE clause with that statement you will get a result set only if there exists rows in the People table (or view) with an ID equal to 44. As far as performance goes, you will want to test thoroughly. Q: What is the alternative for CASE, CAST, CONVERT to improve the performance? A: It really depends on your query, and where these keywords are being used. There isn’t necessarily an alternative for each one (in fact, the CAST statement uses CONVERT under the covers), but rather if you find these being used frequently in sub-queries it could be the case that you will want to reconsider how the query has been written. Q: What is a GUID? A: GUID is an acronym for “globally unique identifier”. It is a datatype common selected in order to ensure uniqueness for objects such as primary keys. The downside to using GUIDs results from the fragmentation that will happen as more data is inserted into the table that is using a GUID as part of a key, constraint, or index. Q: What is better a covering index or an index with INCUDED columns? A: If you were to include all of the columns necessary then you would have the equivalent of a covering index but without the overhead of having to maintain a separate index used just for covering a query. I always advise that a person create their non-clustered indexes using the columns needed for searching and lookups as the index key columns. Then, add in the remaining columns as included columns, as that helps reduce the need to hit the clustered index to retrieve the additional data. Q: Duplicated indexes in my environment can be one problem? A: Yes indeed, duplicate indexes can be an issue for you. The biggest pain point will be decreased performance for every delete, update, and insert statement that hits that table. Each one will need to update that duplicate index, causing more overhead than necessary. You will also have additional storage costs, as the duplicate index is being stored on disk, too. Additional data, additional storage, longer maintenance times, etc. Q: On the estimated count, does that depend on the interval that you update stats? A: Yes it does. The estimates are done based upon the currently available statistics. Q: Please explain about the use of TempDB A: Tempdb is used for a variety of reasons, sometimes good and sometimes bad. The use of tempdb is not, by itself, a bad thing. If you want to know more about the use of tempdb I would point you to this presentation by Bob ward from the 2011 PASS Summit: http://www.youtube.com/watch?v=SvseGMobe2w&feature=youtu.be Q: Can you give us an example of one situation where a scan is better than a seek? A: Whenever the optimizer knows it is easier to scan a table, it will. An example of when it would not be easier is if there was an index but the use of a key lookup would result in a higher cost than if the table was scanned. For small tables it is typically easier to do a scan rather than a seek. Q: What is parameter sniffing? A: Parameter sniffing is how SQL Server works, and it is usually a good thing. SQL Server will cache query plans for reuse. For parameterized queries the cached plan will be built upon the statistics for the set of values that were used the first time the query is executed. Each subsequent query will use also use that same plan, regardless of the values being used as parameters. For systems with a fairly even distribution of data this works just fine, as each plan is likely to be similar anyway. The problem (i.e., “bad” parameter sniffing) arises when the cached query plan is not the optimal choice for the set of values being passed. This can be the result of the data having changed, or statistics being out of date, and the plan ends up taking much longer to retrieve the data set than desired. One common workaround to this issue is to use the RECOMPILE hint in order to force the optimizer to create a new plan for each execution. For a majority of your queries the RECOMPILE hint will likely not be necessary, but you could have a few situations where you may see some benefits. Q: When do you want to partition your indexes? A: I usually advocate that the tables themselves are partitioned. That way any indexes you build will also be aligned automatically with the base table partitions. I advocate the use of table partitions whenever your queries are mostly going to be hitting just a subset of your data and you can create a sliding window based upon a specific date period (i.e., month, year, etc.) and for when you have tables approaching billions of rows. Q: Did Tim say that the use of temp tables will force a recompile each time? A: No, I don’t believe Tim said that. We were discussing the use of table variables, and that the RECOMPILE hint would force a recompile each time. Q: What is better to use in a query? Table variables or temporary tables? A: That depends on the query, of course. The biggest difference I see between the two is that temp tables allows for the creation of statistics whereas table variables do not. So if I have queries that will benefit from statistics (or indexes) then I would rather use a temp table instead of a table variable. At the end of the day though I would recommend you test both methods in order to be certain you are happy with performance. Q: Just to clarify, option recompile appears to be beneficial in optimizer assistance. Could you weigh in on the recompile cost? A: There will be some CPU cost for each recompile. Depending upon the volume of sessions trying to execute that statement, this CPU cost may exceed the benefits you get by having the optimizer create a new plan each time. Q: Do you need the Option Recompile in the query each time it is run or does it just need to be run once? A: That option is a query hint, it gets called every time the query is run. Q: With nested views, do you suggest starting with the deepest view and work your way up? A: That’s as good a place as any to start! Ultimately you need to know what the goal of the query was when it was written and see if there might be a better way to get the job done. Nested views by themselves are not necessarily a bad thing, but they can be problematic over time as the nature of your data changes. Q: Will creating indexed views on a large table and using that indexed view for reads help in performance? How about creating an indexed view? Technically you aren't touching the base table but queries that do can use the index from the view right? A: Indexed views can give you a boost in performance, no question. One of the benefits of indexed views is that queries against the base table can use the indexed view; there is no need to explicitly query against the view in order to get the benefits of the index. The clustered index is a separate structure than the original base table, so you can think of interacting w/ the indexed view as though it is a table. However, indexed views have restrictions that come with them, and it may be the case that while you help query performance for one query you end up hurting performance in other areas. I would recommend that you only use an indexed view on a table that has fairly static data otherwise you are going to end up with additional administration and maintenance overhead as you try to move data in and out of that base table. Q: Should the order of the join columns and the order of the fields in the where clauses, mirror the order of the fields in the index that will be used? A: No that is not needed. The WHERE clause can have those columns in any order, the speed of the data retrieval won’t be affected. It is more important that your query call the columns that exist in the index, regardless of ordering. Q: I created a filtered index and it killed the app. Do you know why the app had issue with filtered index? A: Not without examining your query, no. However, I can suspect a few common reasons as to why that may have happened. One reason is that the filtered index contained a majority of the rows in the table. Another reason is that the statistics for the filtered index may become stale quickly. You will want to run an UPDATE STATISTICS WITH FULLSCAN on the columns in the filtered index on a regular basis to make sure the statistics are kept up to date. Also, you could have made a filtered index and expected your queries to use it, but if it were parameterized or using local variables then it wouldn’t be used. That could have made your query slower – especially if you created the filtered index and dropped the “old” one. Q: What is "reduce LIO"? A: “LIO” is an acronym for “logical I/O”, which is the number of pages moved through memory in order to satisfy a query result set. Q: When joining multiple tables and you have a filter/where clause on each table, should you add the where clauses in one big where clause at the end of the query or as part of... A: Sorry but your question was cut off. If I understand it correctly the answer is that for queries with INNER JOINs it will not matter. For queries using RIGHT, LEFT, and OUTER joins, then you will want to take care to put the right filters in place at the JOIN level or the WHERE clause. Q: Can that bit about turning on ad hoc be repeated? As it relates to bloated plan cache? A: You can enable the ‘optmize for ad-hoc workloads’ option in SQL 2008 and later versions. This option helps to limit the issue of plan cache bloating caused by many ad-hoc queries flooding your plan cache. Q: Isn't the wildcard %? A: Yes. Sorry if I stated otherwise. Q: How about MAXDOP query hints? A: I don’t recommend using MAXDOP hints except in very rare cases. I’d have to see a query consuming all logical cores for an extended period of time and then test to see if the MAXDOP helped to improve performance before considering making that a permanent code change as a query hint. Another point to note is that while the MAXDOP hint will override the server configuration MAXDOP level the hint will NOT override the resource governor. Something worth noting should you be using resource governor currently. Q: How does the width of the clustered index affect the non-clustered indexes? A: Every non clustered index includes the clustering key along with a uniqueifier (for non-unique clustered indexes). So it is the width of the clustering key that gets duplicated into every non-clustered index. Q: Can you elaborate on why table hint makes you ask questions? A: If I see a query hint then I ask “why”? I want to know what problem the query hint was trying to solve. The optimizer is pretty smart. If someone uses a query hint then to me they are saying “I am smarter than the optimizer”, and I simply want to know more about what it is they are trying to accomplish. Q: I think you did not mention what role the MAXDOP option plays in performances. A: You are correct, we didn’t mention the MAXDOP option. That option will limit the number of logical cores an operator in a query is allowed to use whenever the optimizer decides to execution a plan in parallel. In rare cases you may want to use this hint, but most of the time the optimizer makes the right choice with regards to parallelism. Q: In your experience & opinion which type of query tuning do you find the most challenging to troubleshoot? A: What a great question! My short answer is “No, we can fix them all.” The queries most people find hardest to troubleshoot are the ones that have no defined baseline for performance (i.e., someone comes over to say things are “slow” but they have no idea how fast they should be) and they have not been collecting any performance counters for review. It’s really hard to fix something when you don’t know what “normal” means. That being said, focusing on reducing the amount of logical I/O is a great way to improve performance for most any query. Q: What was the name of the book that you referenced when talking about Selectivity? A: The name of the book is SQL Tuning and you can get more details at http://www.singingsql.com/index.html