6 Replies Latest reply on Jun 8, 2010 3:42 PM by swduncan

    Runaway SQL query

      Not sure if this is the forum to post this, but it seems to be the best fit.

      We recently upgraded Solarwinds to 10.5.0.79.  One of the Solarwinds stored procedures [dbm_TrapMessages_DeleteStale] is performing abysmally, taking up 100% CPU time on our SQL Server.

      Anyone else noticed this?

      One of the culprits is this command:

      SELECT TOP 10000 TrapID FROM Traps WHERE DateTime < @date

      As there is no index on the [DateTime] column, SQL Server scans through the table instead (actually a clustered index scan, which amounts to the same thing).  I believe adding a nonclustered index on the [DateTime] column in the [Traps] table will improve performance greatly.  I'll be trying it myself within the next week or so.

      There is another command in that stored procedure that doesn't make much sense:

      SELECT TrapVarBinds.TrapID
      FROM TrapVarBinds
      LEFT JOIN Traps ON TrapVarBinds.TrapID=Traps.TrapID
      WHERE Traps.TrapID IS NULL

      The WHERE predicate will never be true as the [TrapID] column in both the [Traps] and [TrapVarBinds] tables are explicitly defined to not allow nulls.  Again, this is causing extra table scanning.  The WHERE clause should be removed from this statement.

      Our [Traps] table has almost 3 million rows.  Our TrapVarBinds table has over 42 million rows.

       

      Hopefully this will be addressed in a future release.....

       

      Scott

        • Re: Runaway SQL query
          SteveSW

          Scott,

          Thanks for your feedback. What SolarWinds products do you have installed (Toolset, Orion, Kiwi, etc)? You reference an upgrade to 10.5.0.79 which would indicate the SolarWinds Toolset product. However, those stored procedures do not exist in the Toolset product. If we can identify which product you are having trouble with we can better route your comments to the appropriate team to get the issue resolved.

          --Steve

            • Re: Runaway SQL query

              From our techs:

               

                     

              Main Orion Server Details

               

               

              Orion

              Module   Name:

              Orion   Core

              Version:

              2010.1.0

              Service   Pack:

              None

               

               

              APM

              Module   Name:

              Application   Performance Monitor

              Version:

              3.5

              Service   Pack:

              None

               

               

              NPM

              Module   Name:

              Network   Performance Monitor

              Version:

              10.0.0

              Service   Pack:

              None

            • Re: Runaway SQL query
              bshopp

              Can you please open a support ticket so we can investigate and see what is going on

                • Re: Runaway SQL query

                  One of our techs is going to do so.

                   

                  Thanks.

                    • Re: Runaway SQL query

                      This workaround works quite well:

                      CREATE NONCLUSTERED INDEX [IX_T_D]
                      ON [dbo].[Traps]
                      (
                        [DateTime]
                      )
                      WITH (FILLFACTOR = 70);
                      GO

                      CREATE TABLE #tmpdel ([TrapID] bigint);

                      DECLARE @date datetime, @rowcountvar bigint;
                      --NB deletes everything up to 24 hours ago from now
                      --   change the -1 to number of days previous data to keep
                      SET @date = DATEADD(d, -1, GETDATE());

                      Nextchunk:
                      TRUNCATE TABLE #tmpdel;
                      INSERT INTO #tmpdel ([TrapID])
                        SELECT TOP 10000 [TrapID] FROM [dbo].[Traps] WHERE [DateTime] < @date;

                      DELETE FROM [dbo].[Traps]
                      WHERE [TrapID] IN (SELECT [TrapID] FROM #tmpdel);

                      SELECT @RowCountVar = @@ROWCOUNT;

                      DELETE FROM [dbo].[TrapVarBinds]
                      WHERE [TrapID] IN (SELECT [TrapID] FROM #tmpdel);

                      IF @RowCountVar = 10000 GOTO NextChunk;

                      DROP TABLE #tmpdel;
                      GO

                       

                      Wasn't too taxing on the CPU, finished in under 15 minutes for a 44 million row TrapVarbinds table & 5 million row Traps table.

                       

                      Edit: If regular maintenance had to be unceremoniously halted (we had to restart our SQL Server to kill the running maintenance), you will end up with the TrapVarBinds table full of orphaned records (i.e. no matching TrapID in the Traps table).  The following command will delete those orphaned records in batches of 1,000,000 (you'll need to run it multiple times, set up a loop, or just remove the TOP (1000000) bit):

                      DELETE TOP (1000000) FROM [dbo].[TrapVarBinds]
                      WHERE [TrapID] NOT IN (SELECT [TrapID] FROM [dbo].[Traps]);

                      This takes about 10-20 seconds per batch on our system (64-bit VM with 6GB RAM, SAN-attached).  Your mileage may vary.  I suggest starting with a batch of 10,000 first to monitor impact on system (especially if it is shared with other systems).

                        • Re: Runaway SQL query

                          It seems to max degree of parallelism setting can have an adverse effect as well.  I was looking at query plans for the delete process and found that when max degree of parallelism was set to 0 (default), a parallel query plan was being created and it would be doing an index scan of IX_TrapNodeID on the Traps table rather than using the primary key index.  Setting max degree of parallelism to 1 changes this (effectively allows any query to only use 1 CPU) and makes the subquery used in the delete a lot more efficient (as far as I can tell).  This also depends on the number of rows in the Traps & TrapVarBinds tables.