Data Mining: SQL Server Agent Job Performance

Recently, a SQL Sentry user asked me how they could monitor how much CPU their jobs were consuming. However, resource utilization is not tied to jobs explicitly but rather to the queries those jobs run. So, I started off by showing them how to gain some insight into this by column grouping in Top SQL. (If you are not already familiar with this feature, be sure to check out my blog post, “SentryOne Top SQL Tips and Tricks.”)

Data Mining SQL Server Agent Job Performance_Image 1

View of grouping by the Application column in Top SQL

You can see which job step caused the most CPU utilization during the time range in question.

The SentryOne user I was working with felt that this method would certainly be helpful in the heat of the moment of troubleshooting, but they were also hoping to do more analysis over time to pick up on trends, patterns, anomalies, etc. Additionally, since we capture this data through our lightweight trace/XE, SentryOne displays what is returned, which is a job step GUID. So, this might not provide a view of the impact of a job as a whole.

Note that if you quickly want to identify which job those Step GUIDs are tied to, you can run the following queries against the SentryOne database.

/*Use this to convert this binary value to the proper job id (ID Shown in App column in Top SQL) */
SELECT CAST(CONVERT(binary(16), '0xC21D10F87FCF1643A370FDEFD135C3DC', 1) AS uniqueidentifier)

/*Use the result returned in the where clause for the query below */
SELECT ObjectName FROM EventSourceObject
WHERE RemoteObjectID = 'F8101DC2-CF7F-4316-A370-FDEFD135C3DC'

In this blog post, I will be sharing a query I put together to data mine the SentryOne database for an aggregate view of Top SQL events by job and/or job step. This view could provide more insight in the resource utilization per job.

Data Mining SentryOne

Conveniently, the SentryOne database is open for us to query directly, so I grabbed the same query we use for aggregating results in Top SQL (Show Totals view) and modified it a bit. The way I went about this provides two distinctive results. The first aggregates ultimately based on text data and the database it was executed against, which will effectively show rows in the context of job steps. This provides more granularity, but if impact analysis/resource utilization for jobs as a whole is your goal, this might not give you what you need.

The second result set further aggregates by the actual jobs, sacrificing some step-by-step metrics, but provides metrics in the scope of each entire job execution. You can choose to comment one of the output queries as needed.

We will get into options a bit more here in a bit, but the default is an hourly break down, over the past 24 hours, sorted by TotalCPU descending.

Before getting too in the weeds with the code itself, let’s review the results. Bear with me here a bit, I threw the whole kitchen sink into the results bag to provide as much data as possible. You could certainly trim it down a bit.

Returned Data

Identity Columns

Data Mining SQL Server Agent Job Performance_Image 2
Example of Identity Columns in the result set

The big thing here that you do not directly get from Top SQL is the actual job the queries are tied to.

Core Performance Stats

Data Mining SQL Server Agent Job Performance_Image 3
Example of Core Performance Stats in the result set

I opted for measuring duration in seconds, but I understand in environments where jobs tend to run for several minutes, or even hours, minutes might be a better measure. The addition of a simple calculated column can address this.

Additional Performance Stats

Data Mining SQL Server Agent Job Performance_Image 4
Example of Additional Performance Stats in the result set

I also added some of the newer Memory and TempDB columns in case they were helpful. The primary thing to look out for here is large discrepancies in Granted Memory versus Required Memory. Large discrepancies between Granted and Required memory may indicate that, for one reason or another, the optimizer thought the query was going to need more memory than was required for execution.

The Procedure

Now that we understand the kind of data that is returned, let’s take a look at the procedure parameters. Below is the execute statement for the JobQueryPerformance procedure that pulls this data.

EXEC [JobQueryPerformance] 
   @TargetDeviceID
  ,@HourRange
  ,@OrderByMetric
  ,@EndTimeParam

@DeviceID

The DeviceID is used to map to the target that you want to pull these results for. You can find the DeviceID for any target by finding its ID in the Device Table.

SELECT ID, HostName FROM Device
    WHERE HostName like '%<ServerName>%'

@OrderByMetric

The OrderByMetric parameter allows you to define how the results will be ordered. For example, do you care more about CPU, Reads, etc.? I also threw in the option to do this based on the average or sum of these metrics. Depending on the workload, there can be dramatically different results when ordering off Total versus Average, so I wanted to be sure to provide both options.

'AvgCPU', 'AvgReads', 'AvgWrites', 'AvgDuration', 'TotalCPU', 'TotalReads', 'TotalWrites', 'TotalDuration'

@EndTimeParam

When pulling any data from SentryOne, you often need to specify a StartTime and EndTime for a range you want to pull data for. This is no different. If set to NULL, it will set to GETDATE(). This will be useful for auto logging this data.

Note, that if you have targets in several time zones, the timestamps provided and the returned data will be based on each target’s local time.

@HourRange

This is the range in hours in which you want to pull data for. This will effectively be subtracted from the EndTime to dynamically set the StartTime.

Note, that if you have targets in several time zones, the timestamps provided and the returned data will be based on each target’s local time.

Procedure Execution Examples

  • EXEC JobQueryPerformance 1, 24, 'TotalCPU', NULL
    • If this were run at midnight, it would pull data for a full 24-hour day and be sorted by Total CPU values descending.
  • EXEC JobQueryPerformance 1, 12, 'AvgReads', ‘2020-06-22 19:00:00.000’
    • This provides more of a business hours view. This would grab the 12 hours of data between 7am and 7pm. The data would then be sorted by average reads descending.

Additional Tweaks

Aggregation Segments

The default procedure will group the data by Day. But you might want a more granular view, such as hourly aggregates. This can be done with a few tweaks to the proc shown below.

  1. Update the Group By in the core data pull. I have commented this section in the core data pull for reference but note that it is the large “INSERT INTO #Results”. To switch from daily to hourly aggregates, simply uncomment the hourly segment.

Data Mining SQL Server Agent Job Performance_Image 5
Core data pulls GROUP BY

  1. If you opt to use the final SELECT which aggregates the results based on jobs as a whole, then follow the same steps as above but for the corresponding time index columns.

Data Mining SQL Server Agent Job Performance_Image 6
Aggregate output code

You might notice there is a minute breakdown as well. Note this is based on 10-minute segments if you want a more granular breakdown.

Application Filter

The purpose of this original code base is to pull performance aggregates for Agent Job executions. However, the code can be easily modified to open it up all Top SQL events or a different specific application. To do so:

Update or remove the ApplicationName filter from the WHERE clause of the core data pull query.

Data Mining SQL Server Agent Job Performance_Image 7
Core data pulls WHERE clause

Disclaimer: Note that this data comes from the Completed Queries data set, which by default collects queries running for 5 seconds or longer. Understand that if jobs have steps that run in less than 5 seconds, these steps will not be included in this data. Also, you might see the Index columns get flagged by Intellisense. This can be ignored. It is simply due to me adding those columns on the fly.

Conclusion

This is yet another example of being able to directly query the SentryOne database for data that meets your needs. My hope is that you find this helpful and can use this as an example of how you can query the SentryOne database.

THWACK - Symbolize TM, R, and C