    Statistic Threshold for Disk Queue Length


      I am looking for information as to what values these are based on? Occurrences or milliseconds? My critical value baseline was at 2.81, but what does that mean?

          Disk queue length is measuring the average number of items in the queue over the polled time period.  Setting a threshold on these can be tricky because acceptable disk queue lengths vary depending on how many spindles are in the array that the volume is on.  A good SSD array could crunch through a queue of 30000 operations in seconds, but a single local physical disk on a regular server might need 5 minutes just to catch up to that queue, not even factoring the operations that get backed up while waiting.  The idea is that as long as the queue is being emptied quickly you are probably fine.


          Using the baseline can give you a ballpark idea of what normal queue lengths on the server look like, but if there are no severe storage events during the baseline period you will only know what normal is without a reference point for a "high" value.  Figuring out where to set a threshold that you would actually want to alarm/intervene on is fuzzier.  You may find it more effective to remove the threshold from queue length and focus on something like disk latency values over ~20 ms, since that is a more obvious indicator of overwhelmed storage.


