This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Runaway SQL query

Not sure if this is the forum to post this, but it seems to be the best fit.

We recently upgraded Solarwinds to 10.5.0.79.  One of the Solarwinds stored procedures [dbm_TrapMessages_DeleteStale] is performing abysmally, taking up 100% CPU time on our SQL Server.

Anyone else noticed this?

One of the culprits is this command:

SELECT TOP 10000 TrapID FROM Traps WHERE DateTime < @date

As there is no index on the [DateTime] column, SQL Server scans through the table instead (actually a clustered index scan, which amounts to the same thing).  I believe adding a nonclustered index on the [DateTime] column in the [Traps] table will improve performance greatly.  I'll be trying it myself within the next week or so.

There is another command in that stored procedure that doesn't make much sense:

SELECT TrapVarBinds.TrapID
FROM TrapVarBinds
LEFT JOIN Traps ON TrapVarBinds.TrapID=Traps.TrapID
WHERE Traps.TrapID IS NULL

The WHERE predicate will never be true as the [TrapID] column in both the [Traps] and [TrapVarBinds] tables are explicitly defined to not allow nulls.  Again, this is causing extra table scanning.  The WHERE clause should be removed from this statement.

Our [Traps] table has almost 3 million rows.  Our TrapVarBinds table has over 42 million rows.

 

Hopefully this will be addressed in a future release.....

 

Scott

  • Scott,

    Thanks for your feedback. What SolarWinds products do you have installed (Toolset, Orion, Kiwi, etc)? You reference an upgrade to 10.5.0.79 which would indicate the SolarWinds Toolset product. However, those stored procedures do not exist in the Toolset product. If we can identify which product you are having trouble with we can better route your comments to the appropriate team to get the issue resolved.

    --Steve

  • From our techs:

     

    Main Orion Server Details
     
     
    Orion
    Module Name:
    Orion Core
    Version:
    2010.1.0
    Service Pack:
    None
     
     
    APM
    Module Name:
    Application Performance Monitor
    Version:
    3.5
    Service Pack:
    None
     
     
    NPM
    Module Name:
    Network Performance Monitor
    Version:
    10.0.0
    Service Pack:
    None

  • Can you please open a support ticket so we can investigate and see what is going on

  • One of our techs is going to do so.

     

    Thanks.

  • This workaround works quite well:

    CREATE NONCLUSTERED INDEX [IX_T_D]
    ON [dbo].[Traps]
    (
      [DateTime]
    )
    WITH (FILLFACTOR = 70);
    GO

    CREATE TABLE #tmpdel ([TrapID] bigint);

    DECLARE @date datetime, @rowcountvar bigint;
    --NB deletes everything up to 24 hours ago from now
    --   change the -1 to number of days previous data to keep
    SET @date = DATEADD(d, -1, GETDATE());

    Nextchunk:
    TRUNCATE TABLE #tmpdel;
    INSERT INTO #tmpdel ([TrapID])
      SELECT TOP 10000 [TrapID] FROM [dbo].[Traps] WHERE [DateTime] < @date;

    DELETE FROM [dbo].[Traps]
    WHERE [TrapID] IN (SELECT [TrapID] FROM #tmpdel);

    SELECT @RowCountVar = @@ROWCOUNT;

    DELETE FROM [dbo].[TrapVarBinds]
    WHERE [TrapID] IN (SELECT [TrapID] FROM #tmpdel);

    IF @RowCountVar = 10000 GOTO NextChunk;

    DROP TABLE #tmpdel;
    GO

     

    Wasn't too taxing on the CPU, finished in under 15 minutes for a 44 million row TrapVarbinds table & 5 million row Traps table.

     

    Edit: If regular maintenance had to be unceremoniously halted (we had to restart our SQL Server to kill the running maintenance), you will end up with the TrapVarBinds table full of orphaned records (i.e. no matching TrapID in the Traps table).  The following command will delete those orphaned records in batches of 1,000,000 (you'll need to run it multiple times, set up a loop, or just remove the TOP (1000000) bit):

    DELETE TOP (1000000) FROM [dbo].[TrapVarBinds]
    WHERE [TrapID] NOT IN (SELECT [TrapID] FROM [dbo].[Traps]);

    This takes about 10-20 seconds per batch on our system (64-bit VM with 6GB RAM, SAN-attached).  Your mileage may vary.  I suggest starting with a batch of 10,000 first to monitor impact on system (especially if it is shared with other systems).

  • It seems to max degree of parallelism setting can have an adverse effect as well.  I was looking at query plans for the delete process and found that when max degree of parallelism was set to 0 (default), a parallel query plan was being created and it would be doing an index scan of IX_TrapNodeID on the Traps table rather than using the primary key index.  Setting max degree of parallelism to 1 changes this (effectively allows any query to only use 1 CPU) and makes the subquery used in the delete a lot more efficient (as far as I can tell).  This also depends on the number of rows in the Traps & TrapVarBinds tables.