The Many Faces Using SolarWinds Database Performance Analyzer at SolarWinds

Background

This blog initially started out as an examination of how SolarWinds uses Database Performance Analyzer (DPA) within our own production environment. It now includes not only how our DBA uses DPA, but how other business units within SolarWinds use it and why. It isn’t surprising to find people in IT operations and application development using DPA, since our own customer studies have shown a high number of people outside of the DBA role use it, too.

Recent product-specific studies for DPA showed a high number of DevOps/IT Ops and AppDev roles using the product and an eye-opening, broad customer census exposed even more. In the 2019 THWACK Member Census, we asked over 2,200 IT professionals to select their primary job role and only 2.4% selected DBA. Interestingly, when we asked respondents if they managed or monitored databases, 42.7% said yes.

This brings to light the discussion of the “accidental DBA” and some interesting changes in IT organizations. First is the growth in number of DevOps people who handle database-related tasks. Second is the importance of databases as the platform for most mission-critical applications and why everyone has a keen interest in their availability and performance. And last, but not least, the number of DBAs is going down according to Computer Economics, who has seen the percentage of DBAs relative to total IT staff drop to 2.8% in 2017 from 3.3% in 2013. Our own Head GeekTm, Thomas LaRock, wrote an article pointing out the number of DBA jobs has stagnated for almost 20 years. On the flipside, Gartner pointed out that DBMS (Database Management Systems) revenue grew an astounding 18.4% to $46 billion in 2018.

Armed with this information, I decided I’d investigate the SolarWinds DBA team and see if any of these trends held true.

Let’s Start With the DBA

As I mentioned, I initially thought I’d interview the DBA team here at SolarWinds to see how we “drink our own champagne,” since I knew DPA was used by our internal IT team. As it turns out, the “DBA Team” is one person. I guess for a company that did $833 million in revenue in 2018 I expected an entire DBA organization, not just one hardworking DBA. But maybe this isn’t the exception?

I learned a lot from our DBA about how she can keep track of over 250 Microsoft SQL Server databases running on a mix of physical and virtual machines. My biggest takeaway from talking to her was that DBA’s don’t “monitor databases.” They want to be alerted when there are problems and they need a product to help them quickly find and resolve problems when they arise. They also want a product to help them optimize their databases proactively.

The first thing we discussed was “what’s important and who is it important to?” Here are the top things SolarWinds uses DPA for and the primary users:

  • - Overall database health: DBA and IT Ops
  • - Debugging after deployment: AppDev and DBA
  • - Ad-hoc trouble shooting: DBA and AppDev
  • - Capacity planning: DBA

After I learned about the overall database environment (250+ SQL Server databases), I wanted to understand specific, real-world use cases of DPA in action.

DBA Usage Scenarios

So how does the DBA at SolarWinds use DPA? First, she sets up alerts, so she can immediately be sent text notifications from DPA if something goes awry. DPA has had alert notification for a while, but the 2019.4 release made it even easier via a “drag and drop” interface, making alert customization simple. Second, DPA is the first place she goes to when she gets notified about something going wrong, whether it’s an alert, phone call, email, or a help desk ticket opened and assigned to her.

Scenario 1 of 2

In this first real-life scenario, our DBA was alerted to an “assertion check fail” pointing to possible corruption. The SQL Server instance itself created a hard-to-decipher stack dump and the only noticeable thing she could pick out of was the process ID.

With this in hand, she went into DPA to the specific time the event occurred in the SQL Server instance. Since DPA provides both real-time and historical data, she was able to drill down to find 1) the session ID executing this query, and 2) the SQL script running and the database. After speaking with the developer who ran the query, she determined it was a problem with SQL Server itself and asked the developer to refrain from running the query until they got the problem resolved by Microsoft.

pastedImage_0.png

*Screenshot the SolarWinds DBA used to find the culprit of the stack dump SQL Server generated.

Scenario 2 of 2

This second use case brings to light how important DPA is for establishing the overall health of a database and for capacity planning. Our DBA could not stress enough how important it was for her to know the baseline of a database instance and associated queries. From the baselines DPA develops, with the help of machine learning, she can know what a typical day looks like and the behavior of typical database activity. This allows her to spot both anomalies and trends.

Regarding capacity planning, she uses DPA to monitor the utilization and performance of applications and make note of trends she uses for future capacity requirements such as new or additional servers. Luckily, SolarWinds does a quarterly two-week freeze on new applications and changes, and this two-week period gives her a chance to go through DPA reports and proactively tune the environment. DPA’s anomaly detection powered by machine learning is a great way to graphically see the biggest opportunities for proactive optimization.

pastedImage_1.png

*This resource tab in DPA is a favorite of our DBA because it gives her a good overview of server resources being used.

Our DBA believes DPA will be even more useful as SolarWinds starts to migrate databases to Azure PaaS. As she stated, being on top of performance issues like poorly written SQL and poor performing tables doesn’t go away, and the cost of making mistakes, especially those consuming resources, can lead to spikes in usage charges.

Application Development and DPA

As I mentioned at the beginning, I learned a lot about how DPA is used at SolarWinds and the various people and departments using it. The application development (AppDev) team is one of the bigger teams in need of the data DPA provides. Why? Because they, along with our DBA, are constantly deploying changes and want to see the difference.

For example, is the SQL query running slower or faster than before? As previously mentioned, some people are “accidental DBAs,” so if the query they implemented ran fine on a QA instance but in production performs poorly, they need to know why. Case in point, this exact scenario happened recently and was due to a missing index DPA quickly pointed out. As our DBA stressed, for someone not very experienced with index recommendations, the tuning advisors in DPA can be a life-saver.

pastedImage_7.png

*One of the most popular DPA pages used for before and after is also the one used to look at overall waits and is great for seeing changes in before and after performance.

Finally, IT Operations

At SolarWinds, IT Operations (IT Ops) is where the buck stops for overall system availably, and just like our DBA, they make extensive use of alerts. Depending on the alert, they may send a priority 3 email when something has reached a certain threshold. But if SQL Server were down, they would send an email as well as page Opsgenie, which then goes to the primary person on call and posts a message on Microsoft Teams. The IT Ops group also has certain alerts integrated with SolarWindsRegistered Service Desk to automatically open tickets.

But what about databases and their health…does IT Ops care? The answer is yes because they rely on the DPA integration with SolarWinds Server & Application Monitor (SAM) to find the cause of performance issues on servers or when someone complains about application performance. Since DPA and SAM integrate with the SolarWinds OrionRegistered Platform, you can navigate seamlessly between the products.

For example, they used the SAM integration to track a CPU spike on a server to a SQL Server database instance in a critical state. In this case, they immediately reached out to the SolarWinds DBA because they could tell the issue with the server was related to the database. However, if the DBA is unavailable, they rely on the suggestions and recommendations in DPA to diagnose the problem and take action or provide further documentation for either our DBA or AppDev.

Just as DBA and AppDev look for signs of abnormality, IT Ops looks at historical trends to find issues that may correlate to database issues. The integration of SAM and DPA makes this simple.

pastedImage_8.png

*IT Ops uses this page in Server & Application Monitor to see trends and then drill down and isolate the root cause. SAM’s integration with DPA makes this simple.

Summary

As stated in the introduction, the role of the DBA is changing and many people without a DBA title are involved with the performance of database applications. With the movement of database instances to IaaS and PaaS implementations, the ability to optimize, find, and resolve performance issues doesn’t go away. In some ways it becomes more important due to the potential impact on OpEx (aka your monthly Azure bill).

Parents
  • Even as a non-DBA the time I installed this product I immediately found issues that could quickly be resolved. For me the more products that can be pulled under one umbrella the better as you then start to have a single point of truth. So many environments have multiple points where data is accumulated and it becomes difficult to know which one is correct. The bigger issue is correlating the information between disparate systems.

Comment
  • Even as a non-DBA the time I installed this product I immediately found issues that could quickly be resolved. For me the more products that can be pulled under one umbrella the better as you then start to have a single point of truth. So many environments have multiple points where data is accumulated and it becomes difficult to know which one is correct. The bigger issue is correlating the information between disparate systems.

Children
No Data
Thwack - Symbolize TM, R, and C