This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

AppInsight SQL Maxing out DB Server CPU

We are using AppInsight SQL to monitor our production database servers, as I am sure others are as well. We have seen on 3 separate occasions now where the query coming from Orion hangs on the server for days on end, ultimately maxing out the CPU and requiring the DB server to be restarted.

As may be seen from the image below, there are two session id's running on the monitored database server:

AppInsightSQL.JPG

My DBA group tried countless times to kill these sessions, however were unable to do so. Ultimately to clear these queries and reclaim the CPU the database server itself had to be restarted. Obviously this poses a HUGE problem as restarting production machines in our environment (and yours too I'm sure) is easier said than done. Un-managing the server within Orion does not stop the query either.

Each time we have seen this occur, it has been the same query each time. This query when run manually always returns back extremely quickly, and the session is closed out once run. Typically it is the same way from the Orion monitoring platform as well, but occasionally, and unpredictably, it does not.

So far I have opened up two cases on this:

Case# 619436 - Unable to come to a resolution. Requested we enable debug logging to gather more data. Case ultimately closed as the issue did not occur within a timely manner to proceed.

Case# 634478 - Problem occurred again, different server with same specs (Windows 2008, SQL 10.0.2573.0 SP1). Support still unable to come to resolution. Apparently the debug logging does not allow them to go back far enough to when either of these hung queries were started.

So, at this point I have my management losing all faith in this product that was supposed to be a game changer in the way we monitor our environment. I have my DBA group wanting all SQL monitoring stopped so that we do not bring the server to a crawl. Application groups are also starting to question whether they want us to monitor their software now based on the effects AppInsight SQL has had on our DB servers. All the while, support cannot give me any answers. Awesome.

I caution everyone from using the AppInsight SQL monitoring, unless you do not care about adversely affecting your production environment.

In case anyone asks, our SolarWinds environment is as follows:

Primary Server: Windows 2012 R2

Additional Poller: Windows 2012 R2

Additional Web Server: Windows 2012 R2

Database Server: Windows 2008 - SQL 2012

  • SolarWinds has literally thousands of customers monitoring tens of thousands of SQL servers using AppInsight for SQL, none of whom are exhibiting this behavior. While I don't discount the possibility of this being a bug, your case is the only reported occurrence in over a year of AppInsight for SQL's existence. So this is obviously difficult to reproduce internally without additional detail. As it stands now it sounds like this isn't easily reproducible in your own environment either, though it does reoccur. Case 634478 remains opened and I will see that this case is escalated accordingly.

  • Did you recently update your installation of SAM? If so I may be able to help you.

  • I am somewhat pleased to know that we are alone in this issue, and at the same time saddened. Correct, this is certainly not easily reproduced. I suppose I was hoping that based on my original ticket, and that it took upwards of 30 days for the CPU to finally be maxed out from this lingering query that the steps I was given to enable the logging would have been sufficient. Nothing was mentioned that there was a limitation on this additional logging until the second case was opened, leaving us at this point with seemingly nowhere to go. Support is now reaching out for a phone call to further discuss, so hopefully that will prove insightful.

  • Unfortunately no, we have not performed any updates recently. Our last install (still current version on each module we have) was fresh due to polling and reporting issues. I had opened up support cases on these issues as well, but ultimately had to re-install to resolve. There is a thread on here as well regarding the reporting errors, which has still yet to be fully answered.

  • There was similar problem described - "execution of the xp_readerrorlog command stops responding (hangs) and cannot finish. Additionally, the CPU usage of the CPU that is running the command increases to 100 percent."

    There is hotfix from Microsoft - http://support.microsoft.com/kb/973524/en-us

    But it is related to versions prior SQL 2008R2.

  • I had the exact same issue as you.
    When AppInsight for MS SQL was first released, we applied it to three less-critical servers.  CPU on one of the three leaked up to 100% over the course of a week.   We could reproduce it all day long, removing, then adding back the template onto the server.

    After upgrading SAM to v6.1.1, we have not seen the issue return.

  • If you are a Solarwinds employee then you need to go back to customer relations training.  Never ever start a discussion by saying no one else is reporting a trouble.  Customers don't want to hear that.  What they want to hear is an acknowledgement of the customers problem and what can we do to help fix it.  Saying what you said is just like telling him he's doesn't know what he's doing