What is SQL CROSS APPLY? Guide to T-SQL APPLY Operator
Introduced by Microsoft in SQL Server 2005, SQL CROSS APPLY allows values to be passed from a table or view into a user-defined function or subquery. This tutorial will cover the incredibly useful and flexible APPLY operator, such as how the CROSS APPLY and OUTER APPLY operators work, how they’re like the INNER and LEFT OUTER JOIN, and give you some examples of both. All the examples use the AdventureWorks example database.
Later in the article, I’ll also discuss a highly pervasive SQL Server performance problem—one I still encounter with customers on a weekly basis. This problem is related to using the APPLY operator against a specific type of user-defined function. It’s so problematic and can overwhelm tempdb, bringing your SQL Server instance to a crawl!
An inline table-valued function (TVF) can be thought of as a parameterized view in that the function will accept parameters as arguments, but the body of the function only returns a single SELECT statement.
The beauty of an inline TVF is the code inside the function is executed in line with the calling SQL statement instead of getting called for each row being passed into the function. In the example below, a single @ProductID parameter value is passed in and a SUM of the UnitPrice column is returned:
Note the GROUP BY in the body of the function above. The reason I include the GROUP BY clause, although a single ProductID value is being passed in, is because of how values are returned with aggregate functions. The SUM aggregate function returns a NULL value if there’s not a GROUP BY clause included as part of the SELECT statement. Because NULL is a value, albeit an UNKNOWN value, those ProductID values passed into the function will still return a row even if there’s no value returned from the SalesOrderDetail table. With the GROUP BY clause included, if there’s no row found in the SalesOrderDetail table, no row is returned.
I’ll continue to reference this table as I explain the CROSS APPLY operator. Now, let’s look at how the example function, GetSalesByProduct, is called using CROSS APPLY.
When a value is passed from the table or view on the left of the CROSS APPLY operator, that row will only be included in the statement results if the function call on the right side returns one or more values.
Using CROSS APPLY, I can pass the ProductID column from the Production.Product table into the GetSalesByProduct function, which outputs the TotalSales column. Using CROSS APPLY in this way ensures the only rows from Production.Product returned are those with rows associated with them in the Sales.SalesOrderDetail table.
Unlike CROSS APPLY, which returns only Production.Product rows associated with the Sales.SalesOrderDetail table in the example, OUTER APPLY preserves and includes the table or view to the LEFT of the OUTER APPLY operator in the result set. OUTER APPLY is very similar logically to a LEFT OUTER JOIN, as a LEFT OUTER JOIN also preserves what’s on the left of the operator.
As values from this table/view are passed into the function or subquery to the RIGHT of the OUTER APPLY operator, if the function returns a value, it will be included in the results. If the function doesn’t return a value for the parameter(s) passed to it, a NULL value will be returned in the results.
In the output below, ProductID values 1, 3, 2, and 316 don’t have associated sales, so the TotalSales column from the Sales.GetSalesByProduction function returns a NULL.
The APPLY operator is also great for passing values into subqueries. The following query passes the SalesOrderID column from Sales.SalesOrderHeader on the left side to a subquery summing the UnitPrice and OrderQty columns from Sales.SalesOrderDetail. The way this is accomplished behind the scenes is like a FOR loop where every SalesOrderID value from Sales.SalesOrderHeader is passed to the subquery one at a time. This is implemented behind the scenes as a NESTED LOOP join.
You can see from the output of STATISTICS IO that a *lot* of IO is generated against the Sales.SalesOrderDetail table.
The reason for this is for every record passed into the function from Sales.SalesOrderHeader, a seek operation occurs for that value on the Sales.SalesOrderDetail table, as shown in the query plan below.
While APPLY operators share a similar logic with joins, using APPLY isn’t always the best way to write a query most efficiently.
In the join query below, I’ve rewritten the CROSS APPLY query from above to use a join clause to a subquery on the SalesOrderDetail table. The query optimizer notices the subquery isn’t necessary and extrapolates the query to be a simple INNER JOIN instead, where the join condition is essentially the where clause from the subquery above. This results in a much more efficient execution plan because the optimizer can use a MERGE JOIN rather than being forced to use an INNER JOIN as with the CROSS APPLY query above.
You can see from the output of STATISTICS IO and the query plan that the query was returned by simply scanning both tables and using the MERGE JOIN operator to combine the result sets.
You can learn more about using joins in our comprehensive guide to SQL Server JOIN types.
The APPLY operator is also used to pass values from a table or view into a Multi-Statement Table Valued Function (MSTVF). A MSTVF accepts zero or more parameter values exactly like an inline TVF. However, it also allows you to INSERT, UPDATE, and DELETE values existing in a table-variable (which is defined as part of the MSTVF structure).
While the utility of the MSTVF can be attractive, historically, these function calls can be an absolute nightmare in terms of performance. The first reason is for each value passed into the MSTVF, some space in tempdb must be allocated to place the table-variable (they’re NOT memory-only constructs). When many rows are passed to the MSTVF or the function is called overly frequently from multiple connections, this can create contention on SGAM and PFS allocation pages in tempdb. More recent versions of SQL Server have done a really good job of reducing this contention, but you can still overwhelm tempdb by calling these functions frequently with excessive data.
Another source of performance concern from MSTVFs is they’re not called inline—which means the code in the MSTVF is invoked for each row passed into it. This often results in a lot of unnecessary overhead—especially if the code in the MSTVF is simple (which is the case in the example below).
Even today, performance problems related to MSTVF overuse are still some of the most common issues we get called in to fix, and it’s the performance issue I alluded to in the introduction of this article. MSTVFs are attractive to use because of their flexibility, – but please be wary of them. It’s generally not too difficult to refactor database code to use something other than a MSTVF.
While inline TVFs and MSTVFs are highly similar in how they’re called and the results they return to the end-user, they’re not compatible objects. For example, if I try to ALTER the inline TVF I created above, I receive the following syntax error:
So, I’ll drop the previous inline TVF and replace it with the MSTVF.
The MSTVF is called the same way as the inline TVF—by using the APPLY operator. The example below passes each value from Production.Product into the MSFTF and returns all product values having associated records returned from the function call.
The execution plan shows a call against the Production.Product table and a call to the GetSalesByProduct MSTVF. This is a massive issue with looking at MSTVF calls in an execution plan because you can’t see what’s happening inside the MSTVF.
Also, looking at the output of STATISTICS IO isn’t much help as it doesn’t give any valuable information regarding the MSTVF:
To see the actual number of logical reads returned from the above query, I’ll use the Extended Event Profiler tool. To open this tool, expand the XEvent Profiler tool in SSMS Object Explorer and double click on the Standard profile, as shown below:
Once the tool is listening to events on the server, I can run the above query again and capture the output. Here you can see the logical number of reads from the query is slightly over 369K. Whoa!
A small change to using MSTVF can dramatically increase the number of logical reads for the query. Be aware of this common behavior when you see these in a production environment as they’re so commonly used. However, I still fix performance problems all the time by refactoring these problematic objects.
One great feature of using APPLY to pass values into subqueries is the subquery can return multiple columns. The following example uses two correlated subqueries to perform a running total of the number of rows in the Sales.SalesOrderDetail table and a sum of the UnitPrice by CarrierTrackingNumber. So, for each CarrierTracking number in the Sales.SalesOrderDetail table, a count of rows for each CarrierTrackingNumber is returned along with a running summation of the UnitPrice by CarrierTrackingNumber.
From the output, you can see as the CarrierTrackingNumber changes in the results, the aggregations reset but will continue to perform running aggregates for each CarrierTrackingNumber.
The problem with the above approach to reporting aggregate values in a result set is it’s incredibly inefficient. There were three scans of the Sales.SalesOrderDetail table to return the result set, incurring well over one million logical reads, as shown below:
By using CROSS APPLY with a single subquery that returns the necessary columns, I can cut down the number of logical reads and the number of touches on the Sales.SalesOrderDetail table.
Here, I’ve cut the number of logical reads in half versus using the multiple correlated subquery approach.
I can make this even faster by rewriting the above query to use the SUM and COUNT aggregate windowing functions. These built-in T-SQL functions provide the same functionality as the query above but with less complicated coding and more efficiency.
The number of logical reads dropped from close to 550K (shown above) to a bit over 365K for this query—VERY COOL.
The APPLY operator allows you to pass values from a table into table-valued functions and subqueries. Using APPLY, you can significantly expand database code functionality from what a simple join statement allows you. However, you must take care when using the APPLY operator as it’s not always the most efficient way to return results from the database.
My tip is to always be aware of how expensive the queries you write are, and try to make some extra effort to optimize your queries.
By collecting and grouping like queries into a centralized view, SolarWinds© SQL Sentry is built to help you better understand their total impact on database performance. You can learn more about SQL Sentry SSAS Top Commands feature here.
You can also explore the other SolarWinds database solutions designed to make optimizing queries easier by helping you quickly pinpoint issues, tune performance, and improve overall database health.