This discussion has been locked. The information referenced herein may be inaccurate due to age, software updates, or external references.
You can no longer post new replies to this discussion. If you have a similar question you can start a new discussion in this forum.

Logical Disk Queue Length problem

I'm monitoring a system which periodically triggers an alert on the logical disk queue length.  It appears to spend most of its time at 0 or slightly above, which seems normal.  But frequently it spikes up for one interval to millions or even billions.  There seems to be something wrong there.  I'm not sure if this is a problem with APM or with the performance data that Windows is passing to Orion.  I'm currently running performance monitor looking at this same statistic to see if I can see the problem there as well. 

The system is Windows Server 2008 with SP1.

The APM is version 4.0.2 SP4.

If anyone has any ideas or there's any other information I can provide, let me know.

Thanks.

Graeme

  • The first thought that comes to mind is to try a performance counter monitor rather than a WMI monitor.  Sometimes WMI responses from the Windows server are problematic.

  • Check to see if there's a pattern when these alerts trigger. High disk queue length could be caused by things like backups, antivirus scans, database maintenance, etc.  

  • Update: I ran PerfMon for a little while and I just noticed that there's a peak of 220,002.  This seems to indicate this isn't a problem with Orion, but rather the performance monitor data that Windows is passing.

    Thanks for the suggestion Jason, but I checked the APM monitor and it says it's using RPC, not WMI.

    I think the thing here is that these numbers aren't legitimate.  I don't think it's that the system is actually busy, but that there's a reporting error.  I don't think there's any way the disk serviced 220,000 requests in ten seconds, which is how long it took the spike to drop off.

    I'm continuing to run PerfMon, but I've added the individual disks to the monitor to see if there's one in particular that's triggering the alert.  I was just monitoring _Total.

    Graeme