Enabling Higher Resolution Performance Charts in SQL Sentry
When troubleshooting a complex SQL Server performance problem, having high resolution information can make all the difference. With resolution down to 10 seconds whether you are monitoring 5 servers or 500, this has always been a strength of SentryOne. However, there were limits to how wide you could zoom and how far back you could go before aggregations would kick in and some resolution would be lost.
In this post, I’ll cover some exciting enhancements for enabling higher resolution performance data in SentryOne in order to maximize effectiveness when troubleshooting and optimizing SQL Server performance. You can apply these enhancements directly to your existing SentryOne database via the T-SQL scripts provided in the Getting the Enhancements section below.
SentryOne uses multiple levels of data aggregation to facilitate easy and efficient viewing of performance metrics over longer time periods. Think of it like Google Maps—the farther you zoom out, the greater the aggregation and lower the apparent resolution. If you are viewing a range of 30 minutes or less in SentryOne, you will always see raw performance metrics with a resolution between 10 seconds and 1 minute. When zooming out beyond 30 minutes, the resolution tips to one of these aggregations depending on the range size, which we call rollup levels (rollups for short):
We aggregate to ensure good performance when quickly navigating around to different ranges—performance while querying the data from the SentryOne database, pulling it across the network to the desktop or web clients, and ultimately rendering it on charts. When we first developed the SentryOne performance charting, the maximum datapoints we could load per series before performance started to suffer was around 250, and this was a primary driver for the current ranges.
As you zoom out and the data is averaged over larger intervals, the peaks and valleys are naturally reduced. This “smoothing” effect can mask significant spikes in key metrics and cause potentially important performance problems to be missed. The two screenshots below show system CPU with 10-second data at 30 minutes and 2-minute data at 1 hour, respectively:
10-second data at a 30-minute range
2-minute data at a 1-hour range
As you can see, the increase in range size and averaging that occurs with aggregation reduces the apparent resolution of the chart and masks those CPU spikes.
The SentryOne Data Science team experimented with various approaches to minimizing the smoothing effects, and although different techniques such as Power BI’s high-density sampling can work fairly well, we ultimately concluded that the best way to address smoothing is to prevent it from ever happening. The most effective way to do so is to increase the maximum series datapoints allowed for each rollup range. With faster modern CPUs and graphics cards, and performance optimizations to both the charts and the SentryOne database, it’s now possible to load many more datapoints without any slowdowns, and without resorting to dynamic binning which can impact performance.
That said, if you have the free SentryOne Scalability Pack applied to your SentryOne database, you can dramatically increase the maximum ranges for raw data and rollups. The Scalability Pack uses partitioned clustered columnstore indexing and In-Memory OLTP to ensure that even at larger ranges the data comes back lightning fast. Check out Melissa Connors’ (b|t) blog post here and the Microsoft case study here to learn more about how we leverage columnstore and In-Memory technologies to scale the SentryOne monitoring environment to unprecedented levels.
Here are the current and new ranges:
And here is that same CPU screenshot from before, but at a 2-hour range with 10-second data:
Below is the full SQL Sentry Performance Analysis Dashboard for the same range:
On all charts, those peaks are still easy to spot even at the much larger range size!
The larger max ranges are great, but they are only part of the solution for addressing the smoothing issue. Although we keep rollup data for longer than raw data, there are limits to how far back you can navigate before you will run out of data for the active rollup level and have to tip to the next higher level, thus again reducing resolution. The new larger ranges will only increase the likelihood of this occurring. If you go back far enough in time where we can’t effectively render the current range, such as a 30-minute range at a point where we only have 10-minute data (only 3 datapoints), you’ll receive the following message:
This can be frustrating if you’re attempting to look back at an incident that occurred a couple of weeks ago and compare it to a recent incident, for example.
When we introduced the Scalability Pack, which has been oriented more toward larger environments with >250 monitored servers, customers got back loads of space from CCI’s roughly 10x compression over rowstore. In the customer environment shown in the screenshot below, disk space was reduced by almost 600GB and buffer usage by 60GB:
The SQL Sentry Indexes view shows space and buffer used by the SentryOne database. The red arrows indicate the point at which the Scalability Pack was applied.
Most customers were happy to take the space savings, and no one was calling the storage team to give back all that precious flash. Some enjoyed the additional headroom for growth, while others decided to put it back to use by increasing the default performance data retention settings and keep more data. Those users have experienced no degradation in performance, and this is directly attributable to the smaller buffer footprint and reduced memory pressure that comes with partitioned CCI and partition and segment elimination.
Based on these real-world observations and our own testing and analysis, we have formulated a new retention scheme that should work well for anyone with the Scalability Pack, assuming adequate server resources are available, primarily storage space. The matrix below lists the current and new retention settings:
By default, the retention for all raw performance data is increased from 5 to 15 days, and 2-minute retention jumps from 5 to a whopping 59 days! When combined with the new larger range sizes, this enables you to go back almost 2 months while viewing a 16-hour range and experience minimal loss of resolution.
The 2- and 3-day rollups are shown as disabled in the above matrix because with the increased retention for the 1-day rollup, they can typically be turned off, allowing you to reclaim the space that would otherwise be consumed by those levels.
If you have the Scalability Pack applied to your SentryOne database, implementing these new settings is easy! First, use this query to calculate your current usage; then, plug the Raw MB per Day and % of Raw Data for rollups into this spreadsheet for a rough estimate of the storage requirements. The chart below shows estimates at 1GB of raw data per day:
Retention Days and Estimated MB of storage at 1GB of raw performance data per day
After verifying that sufficient space is available to handle the growth, simply execute this script on the SentryOne database. Be sure to read the comments and make any necessary adjustments to the @BaseRetentionHours and @ScaleFactor parameters.
Soon, the Scalability Pack with these resolution and retention enhancements will be the default for new SentryOne installs. They are considered preview until that time, so use at your own risk for now. That said, the practical risks of these changes are minimal, and they should not cause any problems for your SentryOne environment when used properly.
If you upgrade SentryOne before these updates are integrated, the new retention settings will be unaffected, but you might lose the new chart range sizes. If so, simply re-run the top section of the script to get them back.
If you are an existing SentryOne user and don’t have the Scalability Pack but would like to apply it, please reach out to our exceptional Support team for assistance.
I hope you enjoy these improvements—I’m confident they’ll have a big impact as you use SentryOne day-to-day. Let us know how they work for you!