Universal Monitoring and Alerting on free space for all logical disks across all servers

Version 9

    [UPDATE] // 2014-05-23

    I have dramatically improved monitoring script, by including warning/critical levels differentiation, ability to set exclusions, ability to set overrides, more robust error checks... etc

    Please find it here:

    Universal Disk Free Space Monitoring (One Template Will Handle All Logical Disks + Exceptions And Overrides)

    ======================================================

    [UPDATE] // 2014-03-24

    I Have just came up with super-duper-cool way of adding all the below logic onto the page (summary page for example) as a simple resource to list all problematic volumes. See solution at the end of the article

    ======================================================

     

    The following is a two part solution. You can use either one independently, but I have found that combination of two works best for me:

     

    After implementing the below solution you will be able to:

     

    Part #1: - MONITORING:

    1. Monitor any number of servers with any number of disks with just one application component per server

    2. Monitor effectively both large disks (measured in Terra-bytes) and small disks (measured in Gigabytes)

    3. Utilise just one component per server to monitor any number of logical disks (saving you on licenses)

     

    Part #2: - ALERTING:

    1. Alert effectively on both large disks (measured in Terra bytes) and small disks (measured in Gigabytes)

    2. Be able to quickly and easily alter threshold values for certain disks based on specific requirements (very granular custom thresholds, per disk)

     

    =======================================================

     

    MONITORING

     

    (1)

    Let's first create a WMI script to monitor disk space across all logical disks on the server. By default it will sets the following threshold:

    IF FREE SPACE < 4% AND FREE SPACE < 20GB this will trigger an alert. This generally works well for both small and large disks.

     

    *** script attached as a plain text ***

     

    (2)

    Create SAM template with Windows Script Monitor component with the above script and assign application to servers you want to monitor

     

    - Supply ${IP} 4 20000 as script argument in SAM template

     

    - Here is what you should get in front-end

     

    Capture24.PNG

     

    (3)

    Create a group with dynamic query which will include all those applications from all servers. This will give you an excellent idea at any point in time that all your volumes are healthy. I use this group on my dashboard as well

     

    Capture25.PNG

     

    Capture26.PNG

     

    ALERTING

     

    (4)

    Create 3 additional custom properties for "Volumes"

    Capture9.PNG

    We will use them as follows:

     

    v_ovrd_prcnt: to override disk percentage value threshold

    v_ovrd_bytes: to override bytes value threshold

    v_ovrd_desc: to specify some additional notes for other engineers/user about why we have overridden default value

     

    (5)

    Use same principal as above to configure your alert rule + include a possibility to override default rule if necessary

     

    (it looks a bit lengthy and complicated, but works perfectly well)

     

    Capture27.PNG

     

    The result is:

     

    * by default you will receive an alert IF FREE SPACE < 5% AND FREE SPACE < 20GB on any disk on any server

    * to override this rule just simply configure custom properties value for either percentage override or bytes override (or both values if you need to) in SAM front-end for the volume. If you will configure just one override, for example v_ovrd_prcnt = 30%, then it will trigger at 30% free space, regardless of how much space in bytes you have left.

     

    * Note, that I have used 5% and 20GB rule in alerting, as opposed to 4% and 20GB in the above monitoring script. This is because I would like to receive an email alert slightly before it ends up on dashboard for everyone to see (most of the time I am able to quickly fix it silently, without making it visible for the whole business). Well, for you this might be other way round.

     

    Capture28.PNG

     

    ====================================================

    [UPDATE] // 2014-03-24

     

    I Have just came up with super-duper-cool way of adding all the above logic onto the page (summary page for example) as a simple resource.

     

    Here we go:

     

    (1)

    Customize Page > Add Resource > Top XX Volumes by disk space used

    A001rr.JPG

    (2)

    Go to "Edit" resource > change name and copy-paste filter as below

    A001tt.JPG

    (Volumes.Caption not like '/*') AND

    (Volumes.VolumeType = 'Fixed Disk') AND

    (Nodes.Status = '1') AND

    (

    (

      (Volumes.VolumeSpaceAvailable < 20000000000) AND

      (Volumes.VolumePercentUsed > 96) AND

      (Volumes.v_ovrd_bytes IS NULL) AND

      (Volumes.v_ovrd_prcnt IS NULL)

    ) OR

    (

      (Volumes.v_ovrd_prcnt IS NULL) AND

      (Volumes.v_ovrd_bytes IS NOT NULL) AND

      (Volumes.VolumeSpaceAvailable < Volumes.v_ovrd_bytes)

    ) OR

    (

      (Volumes.v_ovrd_prcnt IS NOT NULL) AND

      (Volumes.v_ovrd_bytes IS NULL) AND

      (Volumes.VolumePercentUsed > (100 - Volumes.v_ovrd_prcnt))

    ) OR

    (

      (Volumes.v_ovrd_bytes IS NOT NULL) AND

      (Volumes.v_ovrd_prcnt IS NOT NULL) AND

      (Volumes.VolumeSpaceAvailable < Volumes.v_ovrd_bytes) AND

      (Volumes.VolumePercentUsed > (100 - Volumes.v_ovrd_prcnt))

    )

    )

    (4)

    Voilà

    A001yy.JPG

     

    (5) - BONUS

    If you are not using v_ovrd_bytes or v_ovrd_prcnt custom properties for granular management of thresholds per disk, the following filter will work for any environment - out of the box:

     

    (Volumes.Caption not like '/*') AND

    (Volumes.VolumeType = 'Fixed Disk') AND

    (Nodes.Status = '1') AND

    (Volumes.VolumeSpaceAvailable < 20000000000) AND

    (Volumes.VolumePercentUsed > 96)

     

    --

    Thank you,

    Alex