Mining Performance Data from SQL Sentry (Part 5)

Introduction

As I promised in Mining Performance Data from SQL Sentry (Part 4) , we're going to start digging into database objects that will let us mine current, and historical, performance data from the SQL Sentry database, starting with this post.

There are quite a few objects that store metadata or data concerning performance counters, so we'll start with some basic definitions, and work our way toward more useable queries along the way.

Since wait stats are generally a good starting point for troubleshooting performance, I'm going to tackle those first, so today we're going to learn about the various tables that provide information on wait statistics in the SQL Sentry database.

Definitions

In order to understand performance data from SQL Sentry, some terms need to be explained. I'll be referring to these terms often, so it's best to get a clear understanding of them now.

  • Counter(s): In SQL Sentry terms, a "counter" is any instance of a performance measurement. They could be from any source, and thus are not strictly limited to counters from Windows Performance Monitor (PerfMon). All counters have a category and both internal names and external names, external names being the names suitable for displaying in a report. There is a lot more we can review on counters when we dive into the counters table later on.
  • Timestamp: In terms of Performance Advisor, a timestamp is a numeric value that represents the number of 5 second intervals that have passed since midnight on January 1st, 2000. There are functions in the database that allow us to easily perform conversion on these both to and from datetime values, but if you're going to be mining performance data from the SQL Sentry database, it's a good idea to understand what the raw value is when you see it.
  • Rollup Level: SQL Sentry employs the concept of various rollup levels to manage performance data retention. Each rollup level changes the resolution of the data. The lowest level is detailed data. At each level, the PerformanceAnalysisDataRollupLevel table defines what resolution that rollup level is, and which data table you can expect to find values for that rollup level. This is a very high-level description of this process, but I will explain it a little bit further when I describe the PerformanceAnalysisDataRollupLevel table below.

Waits are Counters

In SQL Sentry, wait types are counters. They may also be grouped into classes or categories to make troubleshooting faster, so if I want to build a query involving waits, I'm going to need to understand the table that holds counters, as well as some tables that specifically expand on wait types.

Now, because we have to look at counters, we also have to look at counter names, categories and category names. We're going to cover several tables here, so I'm not going to explain every single column, but I'll cover the ones we're going to use in the final query.

PerformanceAnalysisCounterCategory

image_thumb26_Compressed
PerformanceAnalysisCounterCategory

Here we're interested in CategoryResourceName, and ID. What about all that other stuff you say? I'll get to that another day, we have a lot to go over in this post, and most of it, while very interesting, is just not relevant to what we're doing here, so we're going to skip it for now. CategoryResourceName gets us our clean name from the next table, and ID is a foreign key in the counter table.

PerformanceAnalysisCounterCategoryName

image_thumb27_Compressed
PerformanceAnalysisCounterCategoryName

In this table we're interested in the CategoryName, which is the clean name for the CategoryResourceName from PerformanceAnalysisCounterCategory. This pattern is used so that we can have system names for use in the application, and clean names for use in reporting.

PerformanceAnalysisCounter

image_thumb28_Compressed
PerformanceAnalysisCounter

CounterResourceName lets us get the clean CounterName from the counter name table. PerformanceAnalysisCounterCategoryID is the foreign key back to the category table.

PerformanceAnalysisCounterName

image_thumb29_Compressed
PerformanceAnalysisCounterName

CounterResourceName joins back to the PerformanceAnalysisCounter table allowing us to use this table to get the clean name for counters, and CounterName is the actual clean name for the counter.

Next, we have some tables that define metadata specifically surrounding wait type counters. As previously stated, there are categories, classes and, of course, types.

PerformanceAnalysisWaitTypeCategory

image_thumb30_Compressed
PerformanceAnalysisWaitTypeCategory

Just a simple name and sort order for our wait categories. Table values include Disk, Memory, Network, and Other. CPU is not included in the table because all signal waits are considered to be CPU waits. We will have to handle this case in our query. Other waits are considered to be wait types that are not directly attributable to a single category.

PerformanceAnalysisWaitTypeClass

image_thumb31_Compressed
PerformanceAnalysisWaitTypeClass

Classes are normally viewed in the sample mode of Performance Advisor, or in reports, and are just a slightly more granular level than categories, but not as detailed as the raw wait types. They will usually be named closer to an actual subsystem like "Transaction Log" or "Latches: Buffer IO."

PerformanceAnalysisWaitType

image_thumb32_Compressed
PerformanceAnalysisWaitType

The previous tables provide the metadata we will need to build our wait statistics query, which will include average waits by server and class. We'll also get to see how the "CPU" category is a special case when we're working with wait statistics by including the percentage of signal waits, even though we're not using categories in our query.

Roll it Up

SQL Sentry collects a lot of data. No really. It's a lot. If we held onto every bit of it, the repository would become unmanageable before long. In order to keep things useable, the system employs a self pruning process that rolls performance data up to a higher resolution as it ages. I'm not going to go into all of the intricacies of that system here, but I will explain the naming convention of the data tables, and how to tell what level of detail is in a given table.

As I mentioned briefly in the definitions area, understanding rollup levels starts with understanding the data stored in PerformanceAnalysisDataRollupLevel

PerformanceAnalysisDataRollupLevel

PerformanceAnalysisDataRollupLevel
PerformanceAnalysisDataRollupLevel

There are several important columns in this table.

ID is not only the key for the table, but it is also the suffix you will use as the name for the data table to go to once you determine the rollup level you want to use in your query.

LevelBreakMinutes tells us the resolution. 2 means that each data point is an average over 2 minutes, and 60 means that each data point is an average over 1 hour.

LastRollupTimeStamp tells us how fresh that rollup level's data might currently be.

Enabled actually controls whether or not SQL Sentry is even using that rollup level. For instance, by default, the 5-minute rollup is not enabled. You can turn it on by setting Enabled to 1 in this table and restarting your SQL Sentry services. By performing this one action, you will cause SQL Sentry to generate performance data for you at a 5-minute resolution for your own reporting purposes.

HistoryDataRetentionHours states how long we're going to keep that rollup level data for. After this amount of time is exceeded, the data is purged. Adjusting this is possible, but you must remember that increasing it directly impacts storage requirements, and quite possibly performance.

Most of the rest of the columns are used for internal purposes, or for the SQL Sentry client.

Performance Data

Performance data for wait statistics is stored in the PerformanceAnalysisData table at the detail level. Detail data for disk counters is stored in another table, which we will talk about in another post.

PerformanceAnalysisData

image_thumb34_Compressed
PerformanceAnalysisData

We probably recognize DeviceID and EventSourceConnectionID from earlier posts.

PerformanceAnalysisCounterID is the foreign key to PerformanceAnalysisCounter.

Timestamp is the Timestamp value discussed in definitions above.

Value is self-explanatory.

InstanceName is the instance name for the counter, as I'm sure you're all aware performance counters can have multiple instances.

Now, here is where this gets interesting. When we want to get some nice juicy historical data, we know it may have been purged. Let's say we want something from 30 days ago. Ok, so 30 * 24 == 720. The first "Enabled" rollup level with a retention higher than 720 is 8. So, the way this works is that, to guarantee I can get my 30 days' worth of data I need to query against the table called "PerformanceAnalysisDataRollup8."

PerformanceAnalysisDataRollup8

image_thumb35_Compressed
PerformanceAnalysisDataRollup8

This table is similar to the basic data table, but you may notice that it has some other values in it that we can use like the Min, Max, Avg, and StdDev. These columns let us know a little more about the rolled-up value since we've lost a bit of detail in the rollup.

As I briefly mentioned above, at the detail level, disk counter data is separate from other data. In the rollup tables they are, merged into a single table making rollup tables good for reporting purposes.

Putting it all Together

Finally, we can take most of what we've been over here and put it into a query that will tell us a bit about our environment.

In this query, I want classes per instance, for every SQL Server instance in my environment. In addition, I want to know what percentage of the waits resource waits are, and what percentage are CPU (signal) waits. I want to get this information for the 2-minute rollup level.

Here is what I came up with using the tables above:

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;

;WITH cteWaits (
     DeviceID
     , ConnectionID
     , Class
     , WaitTime
     )
AS (
     SELECT dat.DeviceID
          , dat.EventSourceConnectionID
          , waitclass.Name AS Class
          , SUM(dat.Value) AS WaitTime
     FROM dbo.PerformanceAnalysisDataRollup2 AS dat
     INNER JOIN dbo.PerformanceAnalysisWaitType AS waittype
          ON dat.PerformanceAnalysisCounterID = waittype.WaitCounterID
     INNER JOIN dbo.PerformanceAnalysisWaitTypeClass AS waitclass
          ON waitclass.ID = waittype.WaitTypeClassID
     GROUP BY dat.DeviceID
          , dat.EventSourceConnectionID
          , waitclass.Name
     )
     , cteSignalWaits (
     DeviceID
     , ConnectionID
     , Class
     , SignalWaitTime
     )
AS (
     SELECT dat.DeviceID
          , dat.EventSourceConnectionID
          , waitclass.Name AS Class
          , SUM(dat.Value) AS SignalWaitTime
     FROM dbo.PerformanceAnalysisDataRollup2 AS dat
     INNER JOIN dbo.PerformanceAnalysisWaitType AS sigwaittype
          ON dat.PerformanceAnalysisCounterID = sigwaittype.SignalWaitCounterID
     INNER JOIN dbo.PerformanceAnalysisWaitTypeClass AS waitclass
          ON waitclass.ID = sigwaittype.WaitTypeClassID
     GROUP BY dat.DeviceID
          , dat.EventSourceConnectionID
          , waitclass.Name
     )
SELECT c.ServerName
     , InstanceName = COALESCE(c.InstanceName, 'DEFAULT')
     , w.Class
     , w.WaitTime
     , CASE 
          WHEN (w.WaitTime - COALESCE(s.SignalWaitTime, 0) < 0)
               THEN 0
          ELSE w.WaitTime - COALESCE(s.SignalWaitTime, 0)
          END AS ResourceWaitTime
     , COALESCE(s.SignalWaitTime, 0) AS CPUWaitTime
     , CASE 
          WHEN (COALESCE(s.SignalWaitTime, 0) * 100 / w.WaitTime > 100)
               THEN 100
          ELSE COALESCE(s.SignalWaitTime, 0) * 100 / w.WaitTime
          END AS PercentCPUWaits
FROM dbo.Device AS d
INNER JOIN dbo.EventSourceConnection AS c
     ON c.DeviceID = d.ID
INNER JOIN cteWaits AS w
     ON d.ID = w.DeviceID
          AND c.ID = w.ConnectionID
LEFT JOIN cteSignalWaits s
     ON s.DeviceID = w.DeviceID
          AND s.ConnectionID = w.ConnectionID
          AND s.Class = w.Class
WHERE c.IsPerformanceAnalysisEnabled = 1
ORDER BY c.ServerName
     , c.InstanceName
     , w.Class;

I used a couple of CTEs here, and an ugly join or two to get what I wanted quickly, but it certainly gets the job done. I've also done a little cleanup where the signal waits don't always apply turning the null values into 0s, just because I thought it looks better, but not really an absolute need.

Here are some of my results:

image37_Compressed

Next Time

I promise the next part of this series will be a bit shorter. Now that we know how to use these counter and data tables, we can get into some more interesting things, and we'll start by building a pivot query for server resource metrics in Part 6!


Mining Performance Data Series

THWACK - Symbolize TM, R, and C