This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Using DPA to find what queries are running at a particular point in time

Hi have a slight issue in that I am not a DBA so I'm not 100% sure what to look for. We had an issue last night on one of our main BI DBs where IO to the Temp DB drive went nuts;

Log Name:  Application
Source:    MSSQLSERVER
Date:      19/08/2019 21:25:00
Event ID:  833

Task Category: Server

Level:     Information
Keywords:  Classic
User:      N/A
Computer:  Our DB Server

Description:

SQL Server has encountered 4657 occurrence(s) of I/O requests taking longer than 15 seconds to complete on file [L:\Data-Files\tempdb3.mdf] in database [tempdb] (2).  The OS file handle is 0x00000000000016C0.  The offset of the latest long I/O is: 0x0000032c950000

Event Xml:

<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">

  <System>

<Provider Name="MSSQLSERVER" />
<EventID Qualifiers="16384">833</EventID>
<Level>4</Level>
<Task>2</Task>
<Keywords>0x80000000000000</Keywords>
<TimeCreated SystemTime="2019-08-19T20:25:00.446407000Z" />
<EventRecordID>1593109</EventRecordID>
<Channel>Application</Channel>
<Computer>Our DB server</Computer>
<Security />

  </System>

  <EventData>

<Data>4657</Data>
<Data>15</Data>
<Data>L:\Data-Files\tempdb3.mdf</Data>
<Data>tempdb</Data>
<Data>2</Data>
<Data>00000000000016C0</Data>
<Data>0x0000032c950000</Data>
<Binary>410300000A000000100000004700530057004400420053003000310041002D0043002D00530051004C00000000000000</Binary>

  </EventData>

</Event>

11 minutes later the Windows cluster service (RHS.EXE) decided MS SQL had failed and shut the service down and restarted it. The resource is already configured to have a 30 minute heart beat delay timer before kicking off such action so, what ever happened sounds pretty severe.

I know we have DB Structural issues (all of Temp DB's DB Files are on a single LUN and the storage provider has highlighted that this is outside of their best practice guidelines and suggested this be split across LUNs), but using dpa (which we have, albeit on 11.1.457) is there a way to identify what queries were running around this time so i can try and ID what potentially could be causing Temp DB to get hammered?

I've tried generating a report to provide top 15 sql queries within the time frame and it provides two queries but it doesnt say when those queries started and whether they succesfully finished (as, logically the query that may be responsible for this event wouldnt have finished succesfully.

Ideally i would like to see a list of queries that were running at the time of the event, preferably with start date/times.

As i'm not a DBA (we dont have one), and BI team does not know themselves whats running around these times trying to get to the root cause of the problem is, challenging ... Management as per normal want answers and are asking me to prove that us not following the storage vendor's best practice is definitely the issue (obvious answer of , just implementing the best practice guidelines then seeing what happens ..... ), we've had these issues now for the past couple of months.

Regards

Somewhat frustrated jack of all trades master of nothing ¬.¬

  • This one may take a little poking around (likely need to look at a few things).

    Also, can you check to see if any backups or dbcc checkdb kinds of jobs are scheduled for during the time frame this occurred?

    I think at this point, a screen share would be best. Have you engaged support to help read the tea leaves?

    I could potentially take a quick look as well...

  • Backups werent running at this time, I beleive an index job is run on Thursdays but no one here is aware of any other specific DBCC query that would run at this time.