21 Replies Latest reply on May 28, 2016 10:22 AM by aLTeReGo

    Linux Drive Monitors not Accounting for Reserved Space

    bobross

      As I'm sure a lot of you are aware, net-snmp doesn't play nicely with the reserved space on some volumes.  This results in a disparity between the readings for % Utilization inside Solarwinds and those seen by admins when performing a df.  Unfortunately, it appears that this is actually an issue with the way net-snmp returns data

       

      # snmpwalk -v 2c -c public localhost hrStorage
      HOST-RESOURCES-MIB::hrStorageIndex.32 = INTEGER: 32
      HOST-RESOURCES-MIB::hrStorageType.32 = OID: HOST-RESOURCES-TYPES::hrStorageFixedDisk
      HOST-RESOURCES-MIB::hrStorageDescr.32 = STRING: /
      HOST-RESOURCES-MIB::hrStorageAllocationUnits.32 = INTEGER: 4096 Bytes
      HOST-RESOURCES-MIB::hrStorageSize.32 = INTEGER: 2766037
      HOST-RESOURCES-MIB::hrStorageUsed.32 = INTEGER: 1000361
      
      
      

       

      As you can see, net-snmp only returns Used and Size, but not available.  This leaves it up to the monitoring software to perform a calculation without all of the relevant data, most notably the values available in df's Available column:

      # df --block=4096
      Filesystem           4K-blocks      Used Available Use% Mounted on
      /dev/sda2              2766037   1000363   1622897  39% /
      
      
      

      The available space is actually pulled in two different ways - bfree and bavail

      statfs("/", {f_type="EXT2_SUPER_MAGIC", f_bsize=4096, f_blocks=2766037, f_bfree=1765675, f_bavail=1622898, f_files=2858240,
                f_ffree=2756839, f_fsid={0, 0}, f_namelen=255, f_frsize=4096}) = 0
      
      
      

       

      Taking a look at the source code for df we can see that both bavail and bfree are used.

      df.c

      input_units = fsu.fsu_blocksize;
            output_units = output_block_size;
            total = fsu.fsu_blocks;
            available = fsu.fsu_bavail;
            negate_available = (fsu.fsu_bavail_top_bit_set
              & (available != UINTMAX_MAX));
            available_to_root = fsu.fsu_bfree;
      [..]
            used = total - available_to_root;
      
      
      

       

      Note that bfree is assigned to 'available_to_root' while bavail is assigned to 'available'.  Without getting too technical, the end result for the % used is something like this:

      ( (used * 100) / (used + available) ) + 1
      
      
      

       

      This ends up giving a different value from what we see in Solarwinds.  Based on what my research it seems that in order to obtain % used, Solarwinds is using the following formula:

      hrStorageUsed / hrStorageSize
      
      
      

       

      This ends up resulting in a different percentage of utilization.  In this example, Solarwinds is only reporting 36% utilization opposed to the 39% shown in df.  While a 3% difference may not seem like much, the problem becomes much more apparent as utilization approaches 100%:

      # df --block=4096
      Filesystem           4K-blocks      Used Available Use% Mounted on
      /dev/mapper/VolGroup00-LogVol00
                             1249815    890351    295976  76% /
      # snmpget -v 2c -c public localhost hrStorage{Index,Type,Descr,AllocationUnits,Size,Used}.4
      HOST-RESOURCES-MIB::hrStorageIndex.4 = INTEGER: 4
      HOST-RESOURCES-MIB::hrStorageType.4 = OID: HOST-RESOURCES-TYPES::hrStorageFixedDisk
      HOST-RESOURCES-MIB::hrStorageDescr.4 = STRING: /
      HOST-RESOURCES-MIB::hrStorageAllocationUnits.4 = INTEGER: 4096 Bytes
      HOST-RESOURCES-MIB::hrStorageSize.4 = INTEGER: 1249815
      HOST-RESOURCES-MIB::hrStorageUsed.4 = INTEGER: 890351
      
      
      

       

      However Solarwinds is showing the following information:

       

      Size4.8 GB
      Space Used3.4 GB
      Space Available1.4 GB
      Percent Used71 %
      Percent Available29 %

       

      At this point we are looking at a 5% difference.  As you approach 100% the difference becomes even greater.  We first discovered this problem ourselves when the 95% threshold alerts that we had set up never alerted even though the volumes had completed filled to the point that applications ceased functioning.

       

      Unfortunately, we are still left with the problem of how to properly monitor these volumes.  I don't believe that Solarwinds can modify the formula because Net-SNMP only provides total and used, so we are left with modifying other aspects.

       

      One method would be to arbitrarily reduce the threshold; however, we can't be sure that all volumes are limited to 5% reserved space.  A second method (which we have pursued) is to implement script monitors for all volumes - We started running into issues with this as the component count for these monitors quickly approached 1500 with more being added every week.  Our last course of action is to modify the way that Net-SNMP sends data for these devices.  To that end I've started to come up with a possible solution; however, having never dealt with something like this before I'd appreciate any advice.

       

      snmpd.conf gives you the ability to implement extension commands that can be used to change the data before it is sent out.  My thinking here is that this can be used to pre-shrink the total size by the reserved space.  This would have the result of bringing the results in Solarwinds closer to what is shown in df, except it appears that we will still be 1% off.  This should be enough though to allow us to properly establish monitoring and alerting on Linux volumes through SNMP.  The pass would be setup on HOST-RESOURCES-MIB::hrStorageSize and use the information available in `df --block=4096` to determine the new size:

       

      # df --block=4096
      Filesystem           4K-blocks      Used Available Use% Mounted on
      /dev/sda2              2766037   1000363   1622897  39% /
      /dev/sda5               254802     49519    205283  20% /var
      
      
      
      

       

      The new size would be found by using the following formula:

      Used + Available = Size
      
      
      

       

      This ends up giving us a fairly accurate result whether we are at the default 5% reserved space (/) or at 0% reserved space (/var)

      /

      1000363 + 1622972 = 2623335
      2623335 / 2766037 = 0.948
      
      
      

      /var

      49519 + 205283 = 254802
      
      
      

       

      Now when Solarwinds runs the used / total formula we get the following result:

      1000363 / 2623335 = 0.381
      49519 / 254802 = 0.194
      
      
      

       

      These values are only 1% off from what is shown by df.  The same holds true for the higher utilization shown before:

      890351 + 295976 = 1186327
      890351 / 1186327 = 0.750
      
      
      

       

      The reason for my lengthy post is that I've never done anything like this before and was hoping for some advice on how to proceed by anyone that has used an SNMP extension before.  If nothing else, I'm hoping that my research and progress will benefit others.

       

      I'm thinking of testing with a pass then actually implementing using persist pass.  The actual script will be run through Perl and it will have to be deployed to ~400 different servers.  What kind of testing would need to be involved since most of these are production devices, although backing out should be as simple as restoring the backed up snmpd.conf file and removing the .pl.  Also, is pass/persist pass the best way to go about this?  From my reading I don't think that exec or extend will allow me to completely override hrStorageSize - These seem to be more for additions than replacements.  Any caveats I need to worry about when implementing the script?  I've already noted the empty line when shutting down and PING/PONG handshake for pass persists.

       

      One major hurdle I still need to cross is how to get the description of the volume with just the index and not having to resort to another snmpget just for this.  Any advice on getting this is much appreciated.

       

      Thanks in advance,

      Bob

        • Re: Linux Drive Monitors not Accounting for Reserved Space
          bobross

          So I've gone ahead and created a test script:

          #!/usr/bin/perl
          
          use strict;
          use warnings;
          
          my $OID = $ARGV[1];
          my $volume = "/";
          my $result = `df -a --block=4096 | grep ' $volume\$' | awk '{print \$3+\$4}'`;
          
          print $OID . "\n";
          print "INTEGER\n";
          print $result;
          
          
          
          

           

          And tried adding in the pass to snmpd.conf:

          #pass  .1.3.6.1.2.1.25.2.3.1.5 /root/snmptest.pl
          #pass HOST-RESOURCES-MIB::hrStorageSize /root/snmptest.pl
          
          

           

          Unfortunately neither seems to be working properly.  The first gives a segfault error when I try to restart snmpd:

          # service snmpd restart
          Shutting down snmpd:                                       [  OK  ]
          Starting snmpd: /bin/bash: line 1:  1660 Segmentation fault      /usr/local/sbin/snmpd
                                                                     [FAILED]
          

           

          And the second doesn't seem to be doing anything...

          # ./snmptest.pl -g 1.2.1
          1.2.1
          INTEGER
          2623260
          # snmpget -v 2c -c public localhost HOST-RESOURCES-MIB::hrStorageSize.32
          HOST-RESOURCES-MIB::hrStorageSize.32 = INTEGER: 2766037
          

           

          Its still returning the old value instead of the one forced by the script.  Unfortunately, google-fu seems to be failing me as far as troubleshooting this problem goes.  As far as man snmpd.conf is concerned (as well as the notes in the snmpd.conf file) I seem to have entered it correctly.

           

          Anyone encountered this before?

           

          Also, still hoping to find some way to get the volume name with just the index without needing the community string.

           

          Thanks again,

          Bob

            • Re: Linux Drive Monitors not Accounting for Reserved Space
              aLTeReGo

              I can't claim to be a NET-SNMP expert, but you may find sending an email to the NET-SNMP mail list beneficial. It appears to be fairly active.

               

               

              Linux Forums are another alternative, but I'd try the email list first as it appears to be the recommended way to get NET-SNMP questions answered.

                • Re: Linux Drive Monitors not Accounting for Reserved Space
                  bobross

                  Thanks alterego.  I've actually got an email out to net-snmp users regarding this.

                   

                  http://sourceforge.net/mailarchive/message.php?msg_id=29541217

                   

                  Long story short...

                   

                  I don't get the seg fault if I setup the pass on either hrStorage or hrStorageSize.index; unfortunately, neither is an ideal solution since the former requires rewriting the entire hrStorage table and the latter means that a pass will need to be set up for every possible index that could be used.

                   

                  Here's the contents of the email I sent in case anyone is interested in the details:

                  Due to net-snmp not returning data relating to reserved space on volumes we have decided to implement a change that will return hrStorageSize as used + available. I'm running into two problems, the first of which is pretty major for this. I have set up a test script using the pass feature that returns arbitrary information and works just fine when run from the command line. It also tests properly through snmpget when I am able to get it running. The problem I'm facing is that I get a seg fault when I attempt to setup the pass at hrStorageSize:

                   

                  pass .1.3.6.1.2.1.25.2.3.1.5 /root/snmptest.pl

                   

                  The seg fault occurs after adding the line to snmpd.conf when I attempt to restart snmpd.

                   

                  The odd thing that I've found is that I'm able to add in a pass at either hrStorage or hrStorageSize.index:

                   

                  pass .1.3.6.1.2.1.25.2.3.1 /root/snmptest.pl

                  pass .1.3.6.1.2.1.25.2.3.1.5.32 /root/snmptest.pl

                   

                  In this instance the script does return data, so I am assuming that it isn't a problem with the script itself. Unfortunately, if I were to setup the pass at hrStorageSize it would be a fairly simple process of just overriding the value returned. When using either hrStorage or hrStorageSize.index there is quite a bit of added complexity (the former being rewriting all of the other returns available from hrStorage and the latter being an excessive number of pass entries in snmpd.conf). I've done quite a bit of searching relating to this, but so far I've come up empty.

                   

                  Any ideas as to how to get the pass to work properly against hrStorageSize?

                   

                  My second question is how does net-snmp determine the index values for volumes? Assuming that I'm able to get hrStorageSize overridden I'll still need to figure out the name of the volume with just the index. I'd prefer not to have to use an snmpget inside of the script itself if at all possible. Is there a table available somewhere that lists the current index values by name? Again, many hours of searching have resulted in no progress on this... I also took a look at an strace of snmpget but wasn't able to see where the index was being pulled from.

                   

                  Thanks in advance,

                  Sheppy

                • Re: Linux Drive Monitors not Accounting for Reserved Space
                  bobross

                  Well after some additional digging I think I found the reason I wasn't able to pass on hrStorageSize.  I checked out the log and there was a conflict between registration of the OID in the MIB and in snmpd.conf:

                   

                  snmpd -Lo
                  duplicate registration: MIB modules host/hr_storage and pass (oid .1.3.6.1.2.1.25.2.3.1.5).
                  

                   

                  By changing the priority of the pass I am now able to pass on hrStorageSize and successfully test.:

                   

                  pass -p 50 .1.3.6.1.2.1.25.2.3.1.5 /root/snmptest.pl
                  

                   

                  Now its just a matter of finding the best method to reference the volume name based on the index.  I would prefer to not run an snmpget because of added overhead as well as needing to dynamically determine the community string (this will be deployed on a few hundred servers with varying SNMP strings).

                   

                  I'll post additional infos as I make more progress on this thing.

                • Re: Linux Drive Monitors not Accounting for Reserved Space
                  bobross

                  Just a little extra info here.  I was able to get SNMPD to start properly.  It turns out that both the MIB and snmpd.conf were attempting to register the OID at the same time.  By setting the priority of pass lower than the default I'm able to get the pass to work properly.

                  • Re: Linux Drive Monitors not Accounting for Reserved Space
                    bobross

                    So I've completed the pass_persist script and it functions properly when run manually from the command line.  I'm able to feed it values and get the correct responses:

                     

                    [root@MST-R7-DEV snmp]# ./newHrStorage.pl
                    PING
                    PONG
                    get
                    hrStorageSize.32
                    hrStorageSize.32
                    INTEGER
                    2623260
                    

                     

                    Unfortunately, it doesn't seem to work properly when included in snmpd.conf as a pass_persist.  After quite a bit of troubleshooting it seems that you can't run snmp queries inside of an snmp query.  This is a major problem for me making this script in any way elegant.  The problem that I run into is that SNMP only passes the OID, so I needed to figure out the type and name of the volume from just an index number.  Since these index numbers aren't available outside of SNMP I had to script in a few snmpget requests to gather this information.

                     

                    Right now I'm left with 3 options, none of which I really like as they add a great deal to the overall complexity of the script (which increases the margin for error).

                     

                    1. "Hack" the snmpd start script in /etc/rc/init.d - I would inject a line into the snmpd startup script that triggers another script which gathers the index, descr, and type information and dumps it into a file that is then read by the pass_persist script.  This is the easiest to implement, but its not without its faults...
                      1. Ugly - Its a damn ugly solution.
                      2. Requires startup using service control - Since we don't admin the systems that this is going to be put on there is a chance that the admin will just run the snmpd command to start SNMP.  If they do this then the information gathering script won't be triggered and the text file won't be populated.
                      3. Reindexing - If volumes get reindexed between restarts then the information in the text file won't match what is expected from SNMP.  This could be worked around by putting in some checks in the pass_persist script that rerun the information gathering script on failure.
                    2. Thread the pass_persist script - I would use a separate thread in the pass_persist script that just works on gathering data from the OS.  This would mean that the information is more up to day than with option 1, but due to the fact that the process isn't spawned until it is first called by snmp there might be some information loss shortly after startup of of SNMP before the information gathering thread is able to populate the variables that the main thread uses to determine name/type for volumes.
                    3. Modify net-snmp source - I would modify the source code of net-snmp directly to return the modified total.  This would have been my primary method, but unfortunately our data center team doesn't have any sort of package management in place.  There also doesn't seem to be a real standard for architecture which makes recompiling and creating the necessary packages a problem.

                     

                    I'll be talking with a few of the guys on my team about the potential problems with each.  Options 2 and 3 would be the easiest to distribute, but even if we decide to go with option 1 I'll try to get some code up here so that other can benefit from it as well.

                    • Re: Linux Drive Monitors not Accounting for Reserved Space
                      bobross

                      After reviewing the previous options, we decided to just go ahead with modifying the net-snmp source and dealing with the distribution issues.  Since RHEL only seems to have old versions of Net-SNMP available (5.3 or 5.4) in the primary repository this will give us the chance to get all of the Linux boxes running the latest LTS version of Net-SNMP (5.7.1).  The source change was pretty easy and I've created a patch.  Once I've had some time to verify the patch doesn't cause any unforeseen problems I'll add it here for others to use.

                       

                      There are a few things to note about this method:

                      1. Total Size values will differ between Solarwinds and df.  Since we don't have a way to modify the formula used by Solarwinds we'll need to account for reserved space on the client size.  The most straightforward way to do this is to use the sum of the Used and Available space (as seen in df) as the Total.  This should amount to a difference in size between the total reported in df and that reported in Solarwinds that corresponds with the amount of space that is reserved.
                      2. 1% difference in Solarwinds and df.  Since the df code arbitrarily adds a 1 to the % utilization calculation (see OP) that doesn't seem to be included in the Solarwinds calculation, the utilization % showing in Solarwinds and df will show a 1% difference.  Unlike the variable % differences we were seeing before, a known 1% difference can be easily accounted for in the alert logic.

                       

                      Hopefully I don't run into any problems with the modifications over the weekend and I can put the patch up early next week!

                      • Re: Linux Drive Monitors not Accounting for Reserved Space
                        bobross

                        Well, thanks to inter departmental politics I've just wasted many, many hours working on this issue.  Sad to say that we won't be implementing this patch in our environment; however, I still wanted to share the patch with the community in case anyone else wants to take advantage of it.  I've attached the patch file I created for Net-SNMP 5.7.1.  There are only about 3 lines of changes between the hr_storage.c and hrh_storage.c files.  Older versions of Net-SNMP might not have the hrh_storage.c file so if you patch against them you might get failures.  The changes are straight forward enough though that most folks should be able to make the changes by hand if necessary.

                         

                        With that said...

                         

                        I've also gone ahead and submitted a patch to Net-SNMP.org that if implemented will add an extra data point for retrieval by SNMP.  This would be hrStorageAvail and be the storage space available to a non-root user.  This value can then be used with hrStorageUsed to calculate utilization values that do take reserved space into account (used / [used + avail]).  This is really the proper way to get this implemented, but a lot longer of a wait until I see anything happen.  If Net-SNMP does go ahead and implement the change I'll see what I can do about getting RHEL to include the patch in the streams for Net-SNMP 5.3 (RHEL5) and Net-SNMP 5.5 (RHEL 6).

                         

                        You can view the details of the patch submission here: https://sourceforge.net/tracker/?func=detail&aid=3558618&group_id=12694&atid=312694

                         

                        I'd appreciate anyone that can test it out against Net-SNMP 5.7 and leave feedback in the comments section of the patch page.  The more testing and interest that is shown, the more likely it will be to get implemented.  From there we'll be able to work with Solarwinds to get support for this new data and hopefully get our utilization monitors and thresholds working optimally.

                        • Re: Linux Drive Monitors not Accounting for Reserved Space
                          bobross

                          Just a little follow up here.

                           

                          The patch which would add in hrStorageAvail as a data point was rejected by net-snmp.org since it is outside of the original Host-Resources-MIB as established by the IETF.  Next step will be to contact the IETF to see about adding this in... currently getting familiar with their mailing list before I actually send it through, but hopefully we'll eventually be able to get this implemented sometime around the release of NPM 15.

                            • Re: Linux Drive Monitors not Accounting for Reserved Space
                              ttl

                              Is there some way to modify the Alert for this so that the variable adds 5% to the amount it reports? So for example, you set the Alert trigger so that it fires when space is actually >=85% (when it is actually 90% if you add the reserved 5% space), but what can be done to modify the alert message? Currently ours looks like this:

                               

                              The ${Caption} volume on ${Node.Caption} is using ${VolumePercentUsed} of its total size. Free space remaining is ${SQL:Select round((VolumeSpaceAvailable / 1024 / 1024 / 1024),1) from Volumes WHERE VolumeID='${VolumeID}'} GB.

                                • Re: Linux Drive Monitors not Accounting for Reserved Space
                                  aLTeReGo

                                  You could use the "convert value" function to automatically add 5% to collected statistic.

                                   

                                  Convert Value.png

                                  • Re: Linux Drive Monitors not Accounting for Reserved Space
                                    chisle

                                    I have found putting this expression in the alert gets a very good approximation to what the actual space consumed is. It gets a little fuzzy at times, but overreporting the % by some small fraction is better than underreporting, especially if you also provide the actual byte counts used and available:

                                     

                                    ${SQL:Select round((VolumePercentUsed / 0.950796999 + 1),3) from Volumes WHERE VolumeID='${VolumeID}'} %.

                                     

                                    On very large volumes (approaching 1T), this can become 100.107% or whatever, but as long as it is scarry enough to get someone's attention....and hopefully, we never get to that point, eh?

                                     

                                    Your pointer about embedding an SQL statement really helped. If there is a way to do the same inside the drawing engine that actually plots the graphs, then we'd have a winner.

                                      • Re: Linux Drive Monitors not Accounting for Reserved Space
                                        chisle

                                        Some tweaks....for the alert (I still haven't figured out how to correct the graphs!)

                                         

                                        You can see below that we can nest CASE statements so that we don't report over 100% or less than 0%. The numbers used in the calculation render nearly good approximations for alerting in volumes up to 1T or so in size. We have not tested with differeing journal sizes, GDT blocks, etc.

                                         

                                        Simulated alert:

                                         

                                        To:

                                        someone@somewhere.com,

                                        Subject:

                                        Critical host.somewhere.com-/ < 65.41% free

                                        Message:

                                        Volume host.somewhere.com-/:
                                        Mount Point: /
                                        Total size: 11.7 G
                                        Free space: 7.26 GB
                                        Use%: 34.59%

                                         

                                        NOTE: on Linux, Free space includes the 5% overhead for the OS, therefore the calculation for Percent used needs to have 5% added to it, and thus the numbers above are slightly different than you will find on the OS using the 'df' command.

                                         

                                        Link to the volume details page for more information (not eht 5% discrepancy!): http://sw.somewhere.com/Orion/View.aspx?NetObject=V:7171

                                         

                                        Alert Name that triggered this event:
                                        Volume <5% Available

                                         

                                         

                                        Compare to 'df -hP /' output:

                                        Filesystem                 Size  Used Avail Use% Mounted on

                                        /dev/mapper/vgroot-lvroot   12G  3.8G  7.4G  34% /

                                         

                                        Trigger Condition: set to Volume Percent Available is less than 10% (we can use SQL here too, but let's not go crazy!)

                                         

                                         

                                        Trigger Action Email:

                                        Subject: ${N=Alerting;M=Severity} ${FullName} < ${SQL:Select CASE When round(100-(VolumePercentUsed / 0.950796999 + 1),3) < '0' THEN '0.00' ELSE round(100-(VolumePercentUsed / 0.950796999 + 1),3) END AS VolumePercentUsed from Volumes WHERE VolumeID='${VolumeID}'}% free

                                         

                                        Message:

                                        Volume ${FullName}:

                                          Mount Point: ${N=SwisEntity;M=VolumeDescription}

                                          Total size: ${VolumeSize}

                                          Free space: ${SQL:Select CASE When round(( (( 0.950796999 * VolumeSize - 128000000) - VolumeSpaceUsed ) / 1024 / 1024 / 1024),4) < '0' THEN '0.00' ELSE round(( (( 0.950796999 * VolumeSize - 128000000) - VolumeSpaceUsed ) / 1024 / 1024 / 1024),4) END AS VolumeSize from Volumes WHERE VolumeID='${VolumeID}'} GB

                                          Use%: ${SQL:Select CASE When round((VolumePercentUsed / 0.950796999 + 1),3) > '100' THEN '100.00' ELSE round((VolumePercentUsed / 0.950796999 + 1),3) END AS VolumePercentUsed from Volumes WHERE VolumeID='${VolumeID}'}%

                                         

                                        NOTE: on Linux, Free space includes the 5% overhead for the OS, therefore the calculation for Percent used needs to have 5% added to it, and thus the numbers above are slightly different than you will find on the OS using the 'df' command.

                                         

                                        Link to the volume details page for more information (not eht 5% discrepancy!): ${VolumeDetailsURL}

                                         

                                        Alert Name that triggered this event:

                                        ${N=Alerting;M=AlertName}

                                  • Re: Linux Drive Monitors not Accounting for Reserved Space
                                    aLTeReGo

                                    SAM 6.3 Beta 2 is now available which includes a Linux Agent for Node, Volume, Interface, and Application monitoring. This agent should address many of the shortcomings associated with monitoring Linux host via SNMP, up to and including properly calculating volume usage statistics referenced in this thread. If you already own Server & Application Monitor and are under active maintenance, you can sign-up to participate in the beta at the link below.