An Introduction to SQL Server T-SQL ORDER BY Clause
Like the previous T-SQL SELECT DISTINCT tutorial, the ORDER BY clause is an optional part of the SELECT statement and orders the result set of a query by one or more columns in the ORDER BY list. This post will discuss the ORDER BY clause, including its use cases, edge cases to be aware of, and common performance considerations. Most of the examples use the AdventureWorks sample database.
Ordering results unnecessarily is one of the most common anti-patterns I’ve seen in my many years working with customers using SQL Server. While SQL Server is more efficient at sorting results than most custom applications, there is often little logical need to do so. While there are times when ordering a report by product or date makes sense, including the ORDER BY in every query as a standard practice is not only illogical but sometimes introduces an astounding amount of overhead to a SQL Server workload.
For example, a few years ago, I assisted a customer whose workload suffered from excessive tempdb activity. After conducting one of our health checks, I found a majority of the tempdb activity was a memory spill from one query on the customer’s table. After examining the execution plan, it was clear the spill resulted from a required sort operation caused by an unnecessary ORDER BY clause. Since the report didn’t require a sorted result, we removed it. Such a simple refactoring resulted in the query becoming 93% faster.
The general syntax for ORDER BY is as follows:
SELECT * FROM Sales.SalesOrderDetail ORDER BY UnitPrice ASC;
The ORDER BY clause uses one or more columns for sorting a result set. If multiple columns are included in the ORDER BY clause (using a comma-separated list), the names of the columns must be unique, and ordering is applied sequentially (from left to right) based on the columns supplied after the ORDER BY clause. The results are first ordered by the column or expression following the ORDER BY, then by the next column or expression in the list, and so on.
It’s worth mentioning using ORDER BY is the ONLY way to guarantee a given result set will always return with a specific sort order applied. Different access methods can cause data to be returned in different orders. While a query’s text may not change, its execution plan can change depending on various factors such as statistics or plan recompilation. If consistent result set ordering is required for a query, an ORDER BY clause must be used.
For each column listed in the ORDER BY clause, you can optionally specify if the column should be sorted in ASCending order or DESCending order, using the ASC or DESC keyword. ASCending ordering is the default and will produce a result set with the lowest values returned first and the highest values returned last. DESCending ordering returns the highest values first and the lowest values last. NULL values are returned together and are considered the lowest values. For string data types, such as varchar and char, alphabetical order rules are based on the collation used on those columns.
The following query is a simple example of how number values are returned using an ASCending order:
SELECT number FROM master..spt_values WHERE type = 'P' ORDER BY number ASC;
Notice the lowest values are returned first and larger values last.
Here is another example of number ordering, but this time in DESCending order:
SELECT number FROM master..spt_values WHERE type = 'P' ORDER BY number DESC;
Here the largest values are returned first, while smaller numbers are returned later in the results.
The ORDER BY clause permits programmatic expressions for defining custom ordering rules. For example, if you always need rows returned first in a result set having a specific status, followed by additional ordering rules. The T-SQL CASE statement works well for this type of logic. The following example uses a CASE T-SQL statement to assign the value 0 to the numbers between 1000 and 1005 and then order the resulting result set by the number column in ASCending order.
SELECT number FROM master..spt_values WHERE type = 'P' ORDER BY CASE WHEN number BETWEEN 1000 and 1005 THEN 0 ELSE number END ASC;
This kind of logic gives developers extra flexibility in determining how the query results are ordered.
For the logical processing order of a SELECT statement, the ORDER BY clause occurs last unless there is a TOP clause (which then always occurs last). Meaning, any column aliases defined in the SELECT statement can be referred to by name in the ORDER BY clause. Consider the following example, which assigns the alias of SalesID to the sum of the SalesOrderDetailID and SalesOrderID columns in the SELECT clause and then refers to the column alias in the ORDER BY clause. Ordering by an expression defined in a SELECT list is common practice.
SELECT (SalesOrderDetailID + SalesOrderID) AS SalesID, * FROM Sales.SalesOrderdetail ORDER BY SalesID DESC;
While it’s useful and commonplace to create a custom expression in a SELECT statement while using a column alias to sort the result set via the ORDER BY clause, one awful practice I’ve seen repeatedly through the years is to provide a column’s ordinal value to the ORDER BY clause. While the following example is perfectly valid syntax, it’s a poor programming practice because the column being ordered by (ProductID in this example) is based solely on the order of the columns defined in the SELECT list.
SELECT * FROM Sales.SalesOrderdetail ORDER BY 5 ASC;
The problem here is the query uses * in the SELECT clause, which is another bad practice. If the table structure changed through a reordering of the defined column positions, the output ordering of the query would be different. As a rule, always provide a name for the column/expression on which the result set is sorted.
Indexes are data structures used to improve the speed of record retrieval. Indexing occurs by maintaining the leaf-level pages of the index in a logically-sorted order based on the leftmost column in the table index definition. The pages in the index are linked in order, making it possible for the index to support an ORDER BY clause without introducing a sort operator into the execution plan.
Consider the following nonclustered index on the ProductID column in the Sales.SalesOrderDetail table:
CREATE NONCLUSTERED INDEX IX_SalesOrderDetail_ProductID ON Sales.SalesOrderDetail ( [ProductID] ASC );
When this index is created, the data in the ProductID column is scanned, sorted, and then stored in the nonclustered index structure. The logical ordering of the data in this index is then maintained by SQL Server for the lifetime of the index.
If a subsequent query is executed against the Sales.SalesOrderDetail table with only the ProductID column being returned, only the nonclustered index needs to be accessed to return the requested data, as shown below:
SELECT ProductID FROM Sales.SalesOrderDetail ORDER BY ProductID;
Since nonclustered indexes contain a copy of some columns from the base table and the pages in the index are sorted, using the ORDER BY clause in this query doesn’t introduce a sort operator into the execution plan.
When adding columns to the column list in a SELECT query, things get more complicated. The following example uses a SELECT * instead of the single specified ProductID column. Since select * is now returning every column in the table, the query can no longer rely solely on the nonclustered ProductID index, so the additional columns must be gathered directly from the Sales.SalesOrderDetail table.
SELECT * FROM Sales.SalesOrderDetail ORDER BY ProductID;
Note: I’m using the SELECT * here because I see it often in customer systems. However, returning all columns from a table isn’t the only way to force SQL Server to reference the base table. Simply selecting a column not included in the nonclustered index will cause SQL Server to reference the base table, known as a key lookup or bookmark lookup.
How much of the base table is being accessed as part of a key lookup is important for performance purposes. Deciding to scan the entire table, or use the nonclustered index with a key lookup, depends on cardinality estimates and the predicates involved. Performing a key lookup is an expensive operation—so much so if SQL Server estimates more than ¼ to 1/3 of 1% of the number of pages in the table will be accessed as part of the key lookup, then SQL Server will simply scan the table. Scanning a table is a fixed-cost operation, solely dependent on the number of pages in the table. However, a key lookup is a linear-cost operation, dependent on the number of rows accessed by the outer table in the execution plan. Kimberly has written extensively on this choice, known as the “tipping point,” which you can find here.
The TOP clause is used to limit the number of records returned by a query. Using the TOP clause without an ORDER BY in a SELECT statement can be problematic because it isn’t deterministic—meaning the query may not return the same results with the same data set if you run it multiple times. Often, I see TOP (1) being used as a means for existence checking—essentially checking to see if a record meeting a certain condition exists. While TOP (1) can be used in this case, it’s much better to use the EXISTS clause for checking to see if a certain condition is met. EXISTS is a short-circuiting operator, returning TRUE whenever a condition is met. TOP (1) can be rewritten by the query optimizer in some cases to behave the same way as EXISTS, but it’s a much better practice to use the EXISTS clause instead.
Another thing to consider when using the TOP clause is the query optimizer will generally choose a plan to return the TOP number of rows as quickly as possible. For trivial queries involving a single table, this isn’t a big deal. However, when there are two or more tables being joined together, using this type of row-goal could result in the optimization engine choosing a nested-loop join to return a small number of rows quickly—even if choosing a merge join or a hash join would be a better choice for joining the involved tables.
One really useful capability of the ORDER BY clause is its ability to return only a subset (called a ‘page’) of the overall set of rows to be returned by the query. This OFFSET and FETCH functionality was introduced in SQL Server 2012 using optional keywords on the ORDER BY clause. These keywords allow you to programmatically return portions of an overall result set (ideal for returning data for web pages, for instance) in a much easier way than relying on the TOP clause, SET ROWCOUNT, or a cursor.
OFFSET is used to specify at which starting point the result set will begin returning rows. Consider the following query and results. Here I’m using the OFFSET clause to skip the first five rows of results and return every row after, starting with the SalesOrderDetailID (the identity column in this table) of six.
SELECT * FROM Sales.SalesOrderDetail ORDER BY SalesOrderDetailID ASC OFFSET 5 ROWS;
FETCH is used with the OFFSET keyword to specify how many rows to return in the result set. In the example below, I’m specifying to skip (OFFSET) the first five rows and then to return (FETCH NEXT) 10 rows only.
SELECT * FROM Sales.SalesOrderDetail ORDER BY SalesOrderDetailID ASC OFFSET 5 ROWS FETCH NEXT 10 ROWS ONLY;
The great thing about using the OFFSET and FETCH functionality of the RDER BY clause is you can use variables to specify the rows to OFFSET and FETCH NEXT. This proves to be extremely useful when designing web applications resulting in many pages of data being returned for the end user to consume. The query below shows an example of this.
DECLARE @Offset INT = 20; DECLARE @Fetch INT = 10; SELECT * FROM Sales.SalesOrderDetail ORDER BY SalesOrderDetailID ASC OFFSET @Offset ROWS FETCH NEXT @Fetch ROWS ONLY;
SQL Server does allow you to include an ORDER BY in a view definition, but only if a TOP or OFFSET keyword is also specified. Consider the following query—it orders the result set by the UnitPrice column and notice the Sort operator in the execution plan.
SELECT * FROM Sales.SalesOrderDetail ORDER BY UnitPrice ASC;
Next, I’m going to take the same query as above and try to create a view out of it, but I receive an error when trying to create the view. This is because an ORDER BY clause is not allowed inside of a view definition without also using a TOP or OFFSET keyword.
CREATE VIEW vwGetAllSalesDetail AS SELECT * FROM Sales.SalesOrderDetail ORDER BY UnitPrice ASC;
The way to avoid this error is to include the TOP clause in the SELECT statement with the PERCENT clause telling SQL Server to return all the rows from the table. At least, it’s what we’re trying to tell SQL Server to return. However, including the ORDER BY with a TOP in this manner does not guarantee SQL Server will return the contents of the view sorted by the UnitPrice column. In fact, when the view is queried, the data isn’t returned in the order specified in the view body, and no Sort operator is included in the execution plan.
CREATE VIEW vwGetAllSalesDetail AS SELECT TOP (100) PERCENT * FROM Sales.SalesOrderDetail ORDER BY UnitPrice ASC; SELECT * FROM vwGetAllSalesDetail;
Windowing functions are specialized functions operating on subsets of rows in a result set. These are incredibly useful functions allowing you to group values together and assign numerical values to rows within each grouping. The ORDER BY clause is part of each windowing function to assign numerical values to rows within each grouping.
In the following example, I use the ROW_NUMBER windowing function to add a monotonically increasing column to the result set, which is assigned based on the ordering of the ProductID column in ASCending order.
SELECT ProductOrdering = ROW_NUMBER () OVER (ORDER BY ProductID ASC), ProductID FROM Sales.SalesOrderDetail WHERE ProductID IN (897, 942);
The Segment and Sequence operators are used in the execution plan to assign the ROW_NUMBER value to the result set. Because the query only returns the ProductID column, only the nonclustered index on the ProductID column is accessed. This index is already ordered by the ProductID, so there is no need to inject a Sort operator into the execution plan.
If I change the query to return all the columns in the table, the query optimizer will revert back to scanning the clustered index, as shown below.
SELECT ProductOrdering = ROW_NUMBER () OVER (ORDER BY ProductID ASC), * FROM Sales.SalesOrderDetail;
The following query is more complicated, using two windowing functions.
SELECT ProductOrdering = ROW_NUMBER () OVER (ORDER BY ProductID ASC), CTNOrdering = ROW_NUMBER () OVER (ORDER BY CarrierTrackingNumber ASC) FROM Sales.SalesOrderDetail;
Notice how there are now two different Sort operators in the execution plan. Because I’ve included two different windowing function calls, and each function is ordering by a different column, two different sorts are needed to achieve the result. The query is NOT scanning the SalesOrderDetail table multiple times, but the data must be sorted differently for each function call.
Another practice I’ve seen repeatedly over the years is to use an ORDER BY clause when inserting data into a table. This is a practice I’ve seen abused significantly, and thankfully this requested ordering is largely ignored. Consider the following example of using a SELECT INTO statement with an ORDER BY clause to copy data from the Sales.SalesOrderDetail table into a new temporary table.
SELECT * INTO #TempTable FROM Sales.SalesOrderDetail ORDER BY UnitPrice DESC;
Notice in the execution plan no Sort operation occurs. SQL Server is ignoring the ORDER BY request in the query. This newly created #TempTable object is a heap, which has no inherent ordering, so the ORDER BY UnitPrice DESC part of the query is ignored.
It’s worth noting had we been using an INSERT…SELECT statement to add records to a table with a clustered index, the ORDER BY clause would still be ignored. Clustered indexes enforce a logical ordering of data in a table, so in the following example, you’d see a Sort operator in the execution plan, but the sort is used to order the data based on the ordering of the clustered index and not by the UnitPrice column. In this example, the #SalesTemp table is clustered on the SalesOrderDetailID column.
The ORDER BY clause is used to order query results and is the only way to guarantee the result set will be sorted consistently. It can be used to page results back to the end user through the OFFSET and FETCH keywords and can also be used with windowing functions to assign ranking to rows within a group. The ORDER BY clause is flexible: it allows for sorting by column names, column aliases, custom expressions, and ordinal column positions; however, such flexibility can lead to bad practices and overuse. There’s often significant overhead when using the ORDER BY clause as it typically requires additional memory grant requests to perform the operation and can dramatically slow down query execution. Only consider using the ORDER BY clause when necessary.
When making decisions about what type of clause to use, understanding SQL Server performance can be crucial. Having a unified solution like SQL Sentry server monitoring can provide you with high-level performance insights and the ability to dive deeper to optimize, maintain, and proactively prevent problems before they occur.