cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post
Level 7

APM 2.5 SQL 2005 DB template - SQL Page Reads/sec and writes/sec constantly going critical

My setup:

Orion SLX server (APM 2.5, NPM, NCM, NTA 3.1, VOIP)

MS SQL 2005 server - housing NPM, NCM databases (runs as standalone SQL server)

I've enable the APM template for SQL 2005 application monitoring of my Orion DB SQL server.  I'm constantly getting the messages:

"SQL Page Reads per Second for SQL Server 2005 Database on INLMDB02 is in a critical state"

"SQL Page Writes per Second for SQL Server 2005 Database on INLMDB02 is in a critical state"

This keeps putting the SQL app in critical state on the status pages.  The default thresholds for both of these component monitors is 100 for critical.  I'm seeing page writes above 100/sec a lot, as well as page reads constantly averaging between 5000/sec and 8000/sec.

Can anyone tell me what this implies with my DB server ?  seems like it's not tuned properly.  The SQL server has 4 GB ram of which, it's currently using 2 GB.  Wondering if this points to performance issues I'm seeing with Orion as it pulls data into views ?

 

Any help is deeply appreciated.

0 Kudos
7 Replies
Level 15

Hi,

You should RDP the SQL Server and launch the Performance console.

Start > perfmon + ok.

Check the Avg. Disk Queue Length.

 

If the average value is constantly high, you might want to check what type of Disk Arrays you have setup on your SQL Server. "RAID 10 provides the highest read-and-write performance of any one of the other RAID levels, but at the expense of using two times as many disks."

 

Optimize Databases: http://msdn.microsoft.com/en-us/library/ms191149%28SQL.90%29.aspx

 

If the average value is ok, increase the thresholds of the components in your APM Application.

 

HTH,

Yann

0 Kudos

I've run into the same thing...  We see our Orion DB Server reads and writes going above 100 frequently.  Is 100 a good value to base this data on? 

Yann, you said if the average value is constantly high we might want to check the disk array type...  What does "high" mean?  Is 400 high?  I set our App Monitor to watch the Disk Queue length on the server as well as the standard SQL stuff and I'm seeing 0-250 topping out around 400-450 once or twice througout the day. 

0 Kudos

What does "high" mean?  Is 400 high? 

I did not specify any explicit value because a high value on a server can be acceptable on another SQL Server depending of the Disk Array.

I am copying below a better explanation about Disks Performance Counters than I could do:

It comes from the following website: http://www.sqljunkies.ddj.com/Article/D1B7C756-4725-4D31-A53D-C0A47976E6BB.scuk

EDIT 10July2009: Article moved here http://searchsqlserver.techtarget.com/generic/0,295582,sid87_gci1265212,00.html

 

I advise to read the whole article. It gives further hints to identify disks bottleneck.

 

Investigating the System Monitor

In the System Monitor there are five Physical Disk counters for each physical disk contained in the Logical Drive that are key to identifying I/O bottleneck. Each disk should be investigated individually.

    1. Avg. Disk Queue Length is the average number of both read and write requests that were queued for the selected disk during the sample interval.
    2. Avg. Disk sec/Read is the average time, in seconds, of a read of data from the disk.
    3. Avg. Disk sec/Write is the average time, in seconds, of a write of data to the disk.
    4. Disk Reads/sec is the rate of read operations on the disk.
    5. Disk Writes/sec is the rate of write operations on the disk.

First, on the System Monitor properties window, slide the time interval to only include the period of “slow performance.” Then calculate the I/Os per disk as follows.

Suppose you have a RAID 5 configuration with four physical disks, and you had the following information from the System Monitor.

Disk Reads/sec            120
Disk Writes/sec           150
Avg. Disk Queue Length     12
Avg. Disk Sec/Read       .035
Avg. Disk Sec/Write      .045

Raid Calculations:

Raid 0 -- I/Os per disk = (reads + writes) / number of disks
Raid 1 -- I/Os per disk = [reads + (2 * writes)] / 2
Raid 5 -- I/Os per disk = [reads + (4 * writes)] / number of disks
Raid 10 -- I/Os per disk = [reads + (2 * writes)] / number of disks

You can use the above RAID calculations to determine the I/Os per disk. 120 reads + (4 * 150 writes) / 4 physical disks = 180 I/Os per disk. This is over the general rule of 125 I/Os per disk capacity. This system has a disk bottleneck.

You can determine that with 720 total I/Os, six physical disks would be required to resolve the disk bottleneck. (720 / 125= 5.76). Adding even more disks will provide room for growth. You should calculate for peak workloads and future growth.

The Average Disk queue length is the total for all physical disks in the RAID configuration. In the example above, 12 for avg. disk queue length / 4 physical disk = 3 queued I/Os per disk. Anything over 2 is reason to investigate further.

Next look at average disk per seconds. In general, Average Disk Sec/Read of 11 ms -15 ms or lower is good. Average Disk Sec/Write of 12 ms or lower are good. Anything above this number is reason to investigate further.

0 Kudos

Yann,

I would love to get some more information on this. Based on your post I believe that we may have a bottleneck problem but I am not sure exactly how to crunch the numbers.

I have included a photo of one sample interval for the statistics that you have recommended to collect. The numbers are a little higher in the snapshot than the average over a number of samples, however this should at least be a good starting point for discussion.

Thanks,

0 Kudos

Thanks Yann!  I think this could be very helpful once I get time to troubleshoot a little more.

It should be interesting to see how this translates to a SAN. 

0 Kudos

We are seeing this also on our Orion deployment that includes NTA.  On our Orion DB that doesn't have NTA everything runs smooth.

0 Kudos

Just as a valid test – I’ll shutdown the NTA service to see if I experience the same results.  I’ve been told that we are just hammering our DB.  The question is, what is the baseline for these types of metrics, under low, medium, and heavy loads.  Does anyone have something we can reference ?

0 Kudos