6 Replies Latest reply on Sep 21, 2010 9:09 AM by AaronAlton

    SQL Blocking Due to Inefficient Query on Syslog

      Hello,

      About a week ago, I started receiving alerts from our SQL monitoring tool indicating heavy blocking on the dbo.SysLog table.  The blocking exists because of a parallel query which is doing a table scan:

      SPID at Head of Blocking Chain:

       

       

        SPID [ecid]:                       135 [0]

        Client Machine:                 (Our Solarwinds Server)                                                                                                                      

        Application:                      Orion Syslog Service                                                                                                           

        Login Name:                     SolarWinds                                                                                                                     

        Last Batch:                        9/7/2010 6:55:29 PM

        Wait Type:                        CXPACKET  << Indicates that the parallel query is waiting on one of the threads to finish

        Wait Time:                        00:00:00  << Ignore this; it represents how long *this* query has been waiting.

        Database:                         Solarwinds

        Command Text:

      UPDATE SysLog SET Hostname = @Hostname

      WHERE (DateTime >= @DateTime)

      AND (IP = @IP)

      AND (Hostname = @IP)

      AND (EngineID = @EngineID )

       

      We're not aware of any changes made in the last week, and so we suspect that one of two things is going on here:

      • The query is running again and again for some reason, and wasn't running previously, or
      • The query has always run on a regular basis, and the data volume in the dbo.SysLog table has caused the SQL Server query optimizer to select a different query plan (a parallel one, which isn't always the best choice).

      I'm leaning towards the latter.  I have a few ways that I can deal with it:

      • I can set the MAXDOP setting on the SQL instance to 1, prohibiting parallel query plans altogether.  This is a bit extreme, so I'd rather avoid it.
      • I can create a covering index so the query runs much more efficiently.  Probably the best option, but this is a big table, and I'd rather not add an extra GB of index bloat.  Also, a new index can occasionally adversely affect other queries which aren't currently running into trouble
      • I can more frequently delete history from the SysLog table and hope my Network Engineer doesn't notice ;)
      • I can break out the table and/or indexes into multiple file groups, so the table scan runs more quickly.  It'll still be doing a table scan though, so this is probably just delaying the inevitable.
      Before I do anything though, I want to make sure that the query above is legit.  There are a few things that strike me as odd about it.  First, why in the world do we need to update the HostName column for entire batches of records on a continuum?  Second, the query is comparing the HostName field to the @IP variable, and unless our hostname matches our IP address (and I sincerely hope that it does not), this will never return any results.
      Can anyone shed some light on what is going on here?

       Thanks much,
      Aaron 

        • Re: SQL Blocking Due to Inefficient Query on Syslog

          Hi Aaron--

          Have you opened a support ticket?

          M

          • Re: SQL Blocking Due to Inefficient Query on Syslog
            Karlo.Zatylny

            Hi Aaron,

            First to answer your specific questions:

            1. This is definitely unexpected behavior.  We would expect that we can resolve the hostname of an IP address so that when we insert entries into the syslog table that the initial entry has the proper hostname.  The continual running of this query is likely caused by new devices (IP addresses) sending syslogs to Orion so that the syslog service thinks it needs to look up these IP addresses in DNS and then update the syslog entries in the database.  However, you would be able to tell us if you see a pattern as to what devices are causing this update statement to execute so frequently.

            2. We set the HostName column to the IP address of the device when we do not already know the hostname for the IP address.  Hence, this will return results for entries that we have not previously updated the hostname column for.

            3. For this I give a more general explanation:

            This query gets executed from the syslog service when it receives a syslog from an IP address that it does not have a Hostname for.  The syslog service then asks DNS to resolve this IP address so that it can update the entries in the syslog table that it needs to with the resolved Hostname.  It is very possible that there are new devices in the network sending syslogs that are causing this issue to pop up.

            Have you run any profiling or looked at the execution plan to tell which column is causing the table scan in your environment?

            The syslog service keeps track of addresses it has already resolved so that it can insert the entry the first time and not perform this UPDATE statement later.  This cache is set to timeout every 60 minutes by default and is set by the DNSCacheTimeOut entry in the SyslogService.exe.config file.

            With the blocking you are describing, I would be curious as to which IP addresses in your system are seeing these frequent updates?  Are they not being resolved by the DNS properly?  Is there a pattern to the devices that are causing these updates?  Would it make sense to perhaps increase the DNSCacheTimeOut configuration in the Orion environment?  (Maybe these are questions for the network engineer.)

            I would think that our current indexes that include the DateTime and IP address columns would already help this SQL statement (IX_Syslog), so I am interested in what is causing the table scans.

            Thanks

              • Re: SQL Blocking Due to Inefficient Query on Syslog

                Apologies for the delayed response.  Your post is very detailed and helpful.  I have pulled a listing of the IP's in our SysLog table which aren't resolving at present - we'll get DNS entries created for them and see if the query continues to run.

                In response to your question regarding the execution plan, yes I did have a look at it.  As noted in my post, the plan isn't the best one for the UPDATE statement that we're seeing, however I wanted to ensure that the statement was legit first.

                I'll post back once we update DNS and let you know how it goes.