21 Replies Latest reply on Jul 17, 2014 8:44 AM by tmeadows

    Alert on disk failure

    Deltona

      Hi,

       

      Could someone from the STM team please provide a step by step guide on how to configure a failed disk alert?

      I need an alert when one or more disks in my SAN has bailed. I would like to get the alert by mail so I will need a guide on how to set up the mail part as well.

      The contents of the alert should include the following:

       

      Alert Name :: Alert Type :: Disk Name :: Disk Status :: Array Name :: Timestamp

       

      Is this possible?

        • Re: Alert on disk failure
          alexslv

          What type of SAN do you have? If it is Dell EqualLogic - I can help. If it is not - I can still show you what I did with EqualLogic and you can work out analogical solution for your SAN as well

          • Re: Alert on disk failure
            Deltona

            Bump.

             

            Can I/We please get an official SolarWinds kb article for this posted here: http://knowledgebase.solarwinds.com/kb/ please?

              • Re: Alert on disk failure
                KMSigma

                I'm in the process of researching, testing, and then drafting up a "How-To."  Please stay posted for the update.

                • Re: Alert on disk failure
                  animelov

                  Give me just a little bit, I'm running into a meeting, however, use this KB as a stop-gap:

                   

                  http://knowledgebase.solarwinds.com/kb/questions/2054/How+to+set+Thresholds+and+Alerts+in+Profiler.

                   

                  Instead of selecting a threshold, do a asset change rule instead, but the process is otherwise the same.

                    • Re: Re: Alert on disk failure
                      animelov

                      Sorry, this is going to be lengthy with screenshots.

                       

                      Ok, apparently I'm wrong on the location, but the good news is that there is still a way to do it and the KB article is actually more relevant.  It IS a threshold rule, not an asset change rule.  Overall rules in Storage Manager are a 3 step process.  First step: Go to Settings in the upper left, then "All Rules"

                       

                      settings.png

                       

                      After that, click on "Add new rule":

                      add_new_rule.png

                       

                      Then "Threshold Rule"

                      threshold_rules.png

                       

                      From there give this rule a name.  Then click the drop down for "Section" and change it to "Storage Array".  The Category will need to be set to "Disk Drive Asset".  After that "Operational Status" is the only option, so, == to down

                       

                      edit_rule.png

                       

                      Alright, so, that ends the rule creation.

                       

                      Step Two, you need to apply the rule to a policy.  To give a little background, a policy is used to do massive configuration changes.  They can also be used to push rules and to isolate those rules/configs to particular devices.

                       

                      First, go to Settings --> Policies:

                      policies.png

                       

                      Then click on the policy you want to apply it to.  In this particular example, it will need to be the SMI-S policy (or NetApp, in the case of NetApp).  You can also create a new one, if you only want to push to specific devices:

                       

                      edit_policies.png

                       

                      Then click on Rules.  Apply the rule that you just saved.  Save, then push (sorry, getting lazy on screenshots)

                       

                      Finally, the last thing that needs to be done is to have a user setup to receive emails, if you haven't already.  Go to Settings --> Manage Users:

                       

                      users.png

                       

                      Edit a current user, or create a new one.  When you are on the edit screen, make sure you have setup an email address.  Once done, click on add next to notifications:

                       

                      edit_user.png

                       

                      Select the device or group of devices you wish to get notifications on.  By default, everything comes in as a warning alert, and can be configured in the event viewer after the first alert gets shot off.

                       

                      Let me know if you have any questions!

                        • Re: Alert on disk failure
                          Deltona

                          Hi,

                           

                          I've gone through the steps and have a few questions.

                           

                          1. At Rule Creation, when selecting Any Instance (circled in image), will this still work when adding new storage arrays that aren't currently present in Storage Manager?

                          2. What is the Operation Status looking at exactly (circled in image), is it the integer or the Status "Name"? According to SMI-S specifications, the Operational Status is an integer. Does Storage Manager convert the integer to text and if so, should i base the condition on an integer or on the conversion?

                           

                          Alert on Disk Failure.png

                           

                           

                          Have a look on doc page 439 here: http://snia.org/sites/default/files/SMI-Sv1.5r6_Block.book_.pdf (Table 250, continues on page 440)

                          Operational Statuses include: OK, Predictive Failure, Error, Starting, Stopping, Stopped.

                           

                          3. Quote: "Then click on Rules.  Apply the rule that you just saved.  Save, then push"

                           

                          I can't find the Rules button. There's 4 *Rules buttons in the settings page that might make sense but can't be sure which one.

                          How do i Push?

                           

                          4. When selecting Notifications under the user, what do i need to select in order to get this alert for disks in the systems that haven't been added yet? Would the following do the trick, provided the newly added storage array to Storage Manager was added to a group as well? What would happen if it weren't added to a group?

                           

                          Alert on Disk Failure Notification.png

                            • Re: Alert on disk failure
                              animelov

                              1. At Rule Creation, when selecting Any Instance (circled in image), will this still work when adding new storage arrays that aren't currently present in Storage Manager?
                              Yes, if you make a few assumptions, and that is:

                                   - You select "Any Instance" (which you mention)

                                   - The rule is applied to the "Default SMI-S Policy"

                                   - The user is setup to receive emails from the new device in question

                               

                               

                              2. What is the Operation Status looking at exactly (circled in image), is it the integer or the Status "Name"? According to SMI-S specifications, the Operational Status is an integer. Does Storage Manager convert the integer to text and if so, should i base the condition on an integer or on the conversion?

                                   - It is stored in the database as a status name, so, we are making the translation before it hits the dataname.  So, string, not integer.

                               

                              3. Quote: "Then click on Rules.  Apply the rule that you just saved.  Save, then push"

                               

                              I can't find the Rules button. There's 4 *Rules buttons in the settings page that might make sense but can't be sure which one.

                              How do i Push?

                                   - Yeah, I guess I shouldn't have been lazy :

                                   policy.png

                               

                              add_rules.png

                               

                              push.png

                               

                              4. When selecting Notifications under the user, what do i need to select in order to get this alert for disks in the systems that haven't been added yet? Would the following do the trick, provided the newly added storage array to Storage Manager was added to a group as well? What would happen if it weren't added to a group?

                                   - Yes, you'd want to add it to a group.  The only other way around it is if you added the unassigned group as the notification, but that would be a flood of message, so therefore, inadvisable.

                                • Re: Alert on disk failure
                                  Deltona

                                  Thanks for that!

                                   

                                  Now how do i get the Storage Array icon to go yellow, red or have an exclamation mark on it?

                                   

                                  notgood.gif

                                    • Re: Alert on disk failure
                                      dsbalcau

                                      Hi Deltona,

                                       

                                      That icon you are referring to is not "Array Status." Rather it reflects "Collection Status." You'll notice that when you add a device to STM, before it has collected, it will be grey (has not yet collected), it is green if everything is collecting fine, and will go red if there are problems. That status is reflected on the new STM Health and Status page in 5.7 as well.

                                       

                                      Overall Array status is something we are looking at showing in the future, but there are no timeframes at this point.

                                        • Re: Alert on disk failure
                                          Deltona

                                          Hi Balki,

                                           

                                          Disk failure is the worst that can happen on a storage array. Collection status in this case doesn't come close to my priorities when a disk fails.

                                          STM Health and Status page is still green, even when a disk has failed so this page is useless to me. I have requested this alert for the past three years. I have finally gotten documentation on how an alert for disk failure is done and hope i don't have to wait another three years to get a visual notification about this type of issue. At this point, my eyes are dried out from all the looking.

                                           

                                          There's a script that can be executed if this threshold is met, the "Run server Script". Would it be remotely possible to execute a script that changes the icon? This is low tech but at least it would get the job done.

                                           

                                          This thread does not count until a real kb article explaining the steps and caveats has been posted to the knowledge base. Better yet, why isn't this configured out of the box? I am really interested in the explanation.

                                           

                                          Really hoping this gets done right....

                                            • Re: Alert on disk failure
                                              dsbalcau

                                              Hi Deltona,

                                               

                                              Yes, OOTB disk failure alerting (it's there, as explained above, just not enabled OOTB) is definitely on the roadmap.

                                               

                                              We are currently working on getting the information this thread into the Admin Guide. I took your KB and raised you one!

                                               

                                              Seriously though, thanks for your feedback, these are great points.

                                                • Re: Alert on disk failure
                                                  Deltona

                                                  Thank you for reading!

                                                   

                                                  Please do look into the possibility of changing the array icon when a threshold is met.

                                                   

                                                  Quote "There's a script that can be executed if this threshold is met, the "Run server Script". Would it be remotely possible to execute a script that changes the icon? This is low tech but at least it would get the job done."