Orion SLX server (APM 2.5, NPM, NCM, NTA 3.1, VOIP)
MS SQL 2005 server - housing NPM, NCM databases (runs as standalone SQL server)
I've enable the APM template for SQL 2005 application monitoring of my Orion DB SQL server. I'm constantly getting the messages:
"SQL Page Reads per Second for SQL Server 2005 Database on INLMDB02 is in a critical state"
This keeps putting the SQL app in critical state on the status pages. The default thresholds for both of these component monitors is 100 for critical. I'm seeing page writes above 100/sec a lot, as well as page reads constantly averaging between 5000/sec and 8000/sec.
Can anyone tell me what this implies with my DB server ? seems like it's not tuned properly. The SQL server has 4 GB ram of which, it's currently using 2 GB. Wondering if this points to performance issues I'm seeing with Orion as it pulls data into views ?
Any help is deeply appreciated.
You should RDP the SQL Server and launch the Performance console.
Start > perfmon + ok.
Check the Avg. Disk Queue Length.
If the average value is constantly high, you might want to check what type of Disk Arrays you have setup on your SQL Server. "RAID 10 provides the highest read-and-write performance of any one of the other RAID levels, but at the expense of using two times as many disks."
Optimize Databases: http://msdn.microsoft.com/en-us/library/ms191149%28SQL.90%29.aspx
If the average value is ok, increase the thresholds of the components in your APM Application.
I've run into the same thing... We see our Orion DB Server reads and writes going above 100 frequently. Is 100 a good value to base this data on?
Yann, you said if the average value is constantly high we might want to check the disk array type... What does "high" mean? Is 400 high? I set our App Monitor to watch the Disk Queue length on the server as well as the standard SQL stuff and I'm seeing 0-250 topping out around 400-450 once or twice througout the day.
What does "high" mean? Is 400 high?
I did not specify any explicit value because a high value on a server can be acceptable on another SQL Server depending of the Disk Array.
I am copying below a better explanation about Disks Performance Counters than I could do:
It comes from the following website: http://www.sqljunkies.ddj.com/Article/D1B7C756-4725-4D31-A53D-C0A47976E6BB.scuk
EDIT 10July2009: Article moved here http://searchsqlserver.techtarget.com/generic/0,295582,sid87_gci1265212,00.html
I advise to read the whole article. It gives further hints to identify disks bottleneck.
In the System Monitor there are five Physical Disk counters for each physical disk contained in the Logical Drive that are key to identifying I/O bottleneck. Each disk should be investigated individually.
1. Avg. Disk Queue Length is the average number of both read and write requests that were queued for the selected disk during the sample interval.
2. Avg. Disk sec/Read is the average time, in seconds, of a read of data from the disk.
3. Avg. Disk sec/Write is the average time, in seconds, of a write of data to the disk.
4. Disk Reads/sec is the rate of read operations on the disk.
5. Disk Writes/sec is the rate of write operations on the disk.
First, on the System Monitor properties window, slide the time interval to only include the period of “slow performance.” Then calculate the I/Os per disk as follows.
Suppose you have a RAID 5 configuration with four physical disks, and you had the following information from the System Monitor.
Disk Reads/sec 120
Raid 0 -- I/Os per disk = (reads + writes) / number of disks
You can use the above RAID calculations to determine the I/Os per disk. 120 reads + (4 * 150 writes) / 4 physical disks = 180 I/Os per disk. This is over the general rule of 125 I/Os per disk capacity. This system has a disk bottleneck.
You can determine that with 720 total I/Os, six physical disks would be required to resolve the disk bottleneck. (720 / 125= 5.76). Adding even more disks will provide room for growth. You should calculate for peak workloads and future growth.
The Average Disk queue length is the total for all physical disks in the RAID configuration. In the example above, 12 for avg. disk queue length / 4 physical disk = 3 queued I/Os per disk. Anything over 2 is reason to investigate further.
Next look at average disk per seconds. In general, Average Disk Sec/Read of 11 ms -15 ms or lower is good. Average Disk Sec/Write of 12 ms or lower are good. Anything above this number is reason to investigate further.
I would love to get some more information on this. Based on your post I believe that we may have a bottleneck problem but I am not sure exactly how to crunch the numbers.
I have included a photo of one sample interval for the statistics that you have recommended to collect. The numbers are a little higher in the snapshot than the average over a number of samples, however this should at least be a good starting point for discussion.
Just as a valid test – I’ll shutdown the NTA service to see if I experience the same results. I’ve been told that we are just hammering our DB. The question is, what is the baseline for these types of metrics, under low, medium, and heavy loads. Does anyone have something we can reference ?
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process.