4 Replies Latest reply on May 23, 2018 11:05 PM by gnorris

    Database integrigy check monitoring

    tszilagyi

      Does anyone have some type of monitoring and alerting for database integrity checks on their SQL servers. 

       

      I am trying to setup alerts right now for last backup and issues found with the DBCC.  I was able to find an item in the app insight for SQL that reports on the # of days since the last backup, but i have not found anything that seems to report on the results of the DBCC.  Any ideas on how to get this setup as easily as possible would be appreciated.

        • Re: Database integrigy check monitoring
          mesverrum

          Doesn't a dbcc check failure report something to the SQL error log that you can report on?  I've definitely created dashboards and reports before so i know the info is in there somewhere.

            • Re: Database integrigy check monitoring
              gnorris

              Here is a sample log entry for a DBCC check. I've tried writing a scan for results but it is very dynamic and not simple to do with SQL query.

               

              DBCC CHECKDB (DemoDatabase) WITH all_errormsgs, no_infomsgs, data_purity executed by domain\svc_SQL_AGNT found 0 errors and repaired 0 errors. Elapsed time: 0 hours 0 minutes 1 seconds.  Internal database snapshot...(more) 

               

              So yes we can look for "Found n errors" and extract a number - then need to get and compare "repaired n errors" to see if it differs - then we need to know.

              Of course with (databaseName) being of variable length the search string gets messy. especially if a database is called 'xyZerrors' or 'foundItems'  it rally sets the cat among the pidgeons.

               

              Which is easiest? to build a test for this or get Microsoft to change a multi-year-old process?

              ANY help appreciated.

                • Re: Database integrigy check monitoring
                  mesverrum

                  Well I don't believe sql has any existing counter for numbers of errors found or repaired in DBCC processes, so therefor Solarwinds can't pull the value of that counter directly, so yeah you are going to have to script it one way or another.

                   

                  If I was going to do this I would probably use a combination of posh and sql scripting to gather what I wanted and get it into solarwinds, but that is just because those are the places I'm most comfortable, lots of other approaches you could take.  I don't recall off the top of my head which tables contain the sql error logs but to parse out the number of errors I would try something to the effect of:

                   

                  select substring(errormessage, (patindex('%found % errors%')+6), (patindex('%found % errors%')+6 - patindex('% errors and repaired%'))

                  from [whatever table contains error logs]

                  where datetime = (select max(datetime) from [whatever table contains error logs] where errormessage like 'dbcc checkdb%found%errors%repaired%')

                   

                  Use a similar type of logic to figure out the number of repairs, then you could either take those values directly into solarwinds with a sql user experience monitor or you could get fancier and identify specifically the difference between the two and only include entries where there is a difference.

                   

                  Keep in mind that I'm not on VPN and that is just off the top of my head so double check your syntax and such, but conceptually it should work.

                  If the server has multiple databases you might need to come up with a scheme to break those apart but that should get you in the right ball park.

                • Re: Database integrigy check monitoring
                  gnorris

                  The example above is taken from the SQL Error log.

                  As I point out - the issue is determining how to determine an error.. The DBCC CHECKDB process outputs a log entry regardless of the outcome - good or bad.

                  So, the only way to determine an error is to scrape the output for where it does not have "Found 0 errors"