How to Win With the New Health Score

2020 may have been a year to forget for a lot of us, but it did bring us one amazing thing:  The SentryOne Portal.  Now branded SQL Sentry Portal, this was an evolution of the web front-end for SolarWinds SQL Sentry.  Suril Jasani put together a great article for its release back before the pandemic changed all our lives and I’d encourage you to see how it all came together.

EHO

While great development continues for all of SQL Sentry Portal, for this article, we’re going to focus on the Environment Health Overview, or “EHO”, section shown here:

When opening any version of SQL Sentry Portal up to 2023.2 (foreshadowing anyone?), you’re greeted with a large Health Score number attempting to summarize the overall health of your entire SQL Server estate.  It’s been in SQL Sentry as long as I can remember and was no easy feat to build and calculate.  To oversimplify the algorithm for you: we gather all alerts and events generated in the environment, score them, and perform an aggregation to determine a singular value.

While I was a customer of SQL Sentry, leading a team of DBAs, I always had two gripes about this metric:

  1. Is the score better or worse than yesterday? How about last week?
  2. Is this number good or bad?

For some environments, a Health Score of 67 might be fantastic.  For others, it might be dreadful. Context and history matter.

When our Product Manager asked for feedback on how to revamp the SQL Sentry EHO, I was all too eager to lend my thoughts.  It’s also what got me on the hook to write this article.

Historical Health Score

Let’s dive into how we’re iterating and improving on these areas in SQL Sentry 2023.3 with the “Enhanced EHO.”  If I’m a DBA and my environment goes down 10 points overnight, I’d want to know why.  If it goes up 10 points, I’d do a victory lap and show it off to my boss, but I’d also want to know why it happened.  Courtesy of some hard work from the engineering teams, the entire top part of EHO now plots these scores over time.

Pretty snazzy, right?  We can now see exactly how our full environment has tracked up to 30 days in review.

 

Server-level Health Scores in SQL Sentry

Prior to 2023.3, the “All SQL Servers” tab in EHO would give the best estate-level view (in my opinion) for watched targets:

This is typically where I would direct customers that wanted more of a NOC or Executive level view.  Unfortunately, the Health Score isn’t color-coded. It becomes lost in other metrics, and the cards are too large.  We decided to add smaller versions of these cards into the front page, right underneath the historical bar graph, making it clean and concise, ranked by Health Score low to high.

Now we can see all problem targets quickly and begin to take meaningful steps to improve their health.
Note that as of 2023.3, the “Critical” and “Unhealthy” radio buttons are selected by default, but the “At Risk” and “Healthy” buttons are not.  You may need to select them to see all your targets.

 

Delta Scores in SQL Sentry

Now for the feature I’m most excited about.  SQL Sentry Portal now shows historical, environment-level scores, and you can view current per-target scores faster.  How will you know individual targets are changing over time?  Enter the Delta feature.  By selecting this button, SQL Sentry will switch from displaying the absolute health score to the change in each score since the last Health Score calculation.  Those lovely UX folks even added some fancy-looking stock ticker icons:

If I was still a production DBA, the first thing I would do in the morning (after getting coffee, of course) would be to look at this Delta view.  I’d gather my list of targets that went down in score and begin figuring out why.  If I had enough time, I’d probably look at the ones that went up as well.

 

Win With It

If you’re a computer/video/board game player like me, you always want to achieve the highest score possible.  I’ve heard many fun stories of SQL Sentry customers gamifying this metric with their teams and leveling it up.  While very commendable, please note that it will be near impossible to get the environment-level Health Score up to 100.  Because of this, let’s talk about what we advise as strategy to make this metric relevant and actionable.

First, establish your baseline.  Find a representative few days or week-long stretch for your environment and record that score.

Second, drill into your lowest individual targets.  If there are alerts you can address, start diagnosing them!  It might feel a lot like Whack-A-Mole at first, but over time and effort, you can improve your environment just by working with the alerts at hand. 

Lastly, dial down the noise.  If a group of alerts are firing but aren’t relevant, consider turning them off or overriding/disabling them at the target level.  We have some great resources on how to use inheritance to dial in your alerts to reduce some of the noise:

 

Closing

The Enhanced EHO feature is evolving quickly into a fantastic, concise, actionable, and (fingers crossed) prescriptive macro-level view of your SQL Servers.  I am extremely excited about these changes; I wish I had the historical and delta views when I was a production DBA.  I can’t wait to see how we continue to evolve it over time and would love to hear any of your feedback on it as you test it out for yourself.

Until next time!

-BT

Thwack - Symbolize TM, R, and C