Thresholds and Alerting: Where the Magic Happens

In the previous two posts, we talked about high level performance information and then we dove into the details around storage performance from the array, pool, and LUN/Volume detail. Now let's talk about thresholds and alerting. This is where we start making Storage Resource Monitor adapt to your environment, while also showing what performance information matters to you. 

Thresholds

Setting thresholds is a key step in making sure your data center runs efficiently. When you start SolarWindsRegistered Storage Resource Monitor the first time, there are pre-set thresholds setup based on general best practices. For most situations this will work, however there are solutions that require something a little more specific. There are applications in your environment that require low latency and if any of them deviate from that it would cause major headaches. There are other applications that require a specific amount of IOPS and any dip will slow the business down and lead to your inbox being filled with not so nice requests for information. Having your thresholds set properly can help you avoid "fire drills." The "SRM Settings" section is where you can set global thresholds for key storage resources.

pastedImage_2.png

Thresholds can be set for IOPS, throughput, I/O size, Capacity, and latency (LUN & Volume specific).  In addition, some of these can be set by read, write, or total so you can even customize for applications that are heavy on read or heavy on write performance. 

pastedImage_4.png

Using global settings allows you to tailor monitoring for your data center, but, as you know, there are also applications that differ from the others that need special attention.  If that’s the case, Storage Resource Monitor has you covered. Under each details screen (array, pool, and LUN/Volume), you can adjust the thresholds for that specific resource. Pool 1 needs to maintain 500 IOPS and I need to know when it goes below it. You can set the threshold to warning when IOPS are less than or equal to 600 and critical when IOPS are less than or equal to 550. LUN 2 has to maintain latency of 50ms. You can set the threshold to warning when it hits 40ms and critical when it hits 50ms. The thresholds you set for the individual resources will translate to the summary screens we talked about before, so at-a-glance you can see if the required performance needs are being met.

pastedImage_6.png

pastedImage_7.png

Alerting

So now you’re thinking, "thresholds are great, but if something happens when the custom thresholds are  reached I need to be alerted."  In addition to custom thresholds, setting custom alerts will make sure you know when something goes wrong quickly. Like before, the standard alerts in Storage Resource Monitor will get you going, however custom alerts help make sure you understand if all of your resources are performing as required. Creating custom alerts can be done for groups of resources with the same performance profile or for specific resources that have a very unique requirement.

pastedImage_9.png

You can set a single alert for a specific storage resource or set an alert for multiple resources that share a common performance profile. There is the ability to customize everything from a specific team to handle the alert, to setting that the condition has to exist for a period of time, and even the ability to set the alert to only be enable during a certain time of day to name a few. Setting a custom alert for a specific time helps avoid the unwanted alerting noise during expected downtime and/or planned degraded performance. 

pastedImage_10.png

By using thresholds and custom alerts, Storage Resource Monitor has you covered when monitoring storage performance for all your applications. Along with dashboards and storage resource details, you can easily stay ahead of your storage performance needs and track when more resources are needed.

What are some of your best practices around thresholds? What are the items you customize with alerts?

Anonymous

Top Comments

  • OP could legit double the size of that TV and it wouldn't be too big for the space upsers

  • @jfb thank you. Yesterday I was able to find the table (Orion.SRM.LUNThresholds) and if I change one manually (from global warning 2 to 10 and critical from 2.5 to 15) I can see these columns change: 

    ThresholdType changes from 0 to 2
    ThresholdOperator changes from 1 to 0
    Level1Value changes from 2 to 10
    Level1Formula changes from NULL to 10
    Level2Value changes from 2.5 to 15
    Level2Formula changes from NULL to 15

    Does that appear to be everything on these?  It sure would be nice if there was a UI to select multiple LUNs and just adjust. Feature request suggestion. 

  • Your instinct is correct. You would need to do that via a direct query. Currently there is no way to do it through the UI or via the SDK. I don't have a query offhand to offer but if you look at the tables in either SWQL Studio or Database Manager, you should be able to easily find the tables where those thresholds are stored.

  •  I am no longer at SolarWinds, but I think  might be able to point you in the right direction.

  •  This is great, however how do you set multiple LUN overrides?  I have 35 LUNs that need a different threshold than the global(all the same). Going into each manually seems cumbersome.  Sometimes there are hundreds of LUNs that all need the same threshold adjusted. How is this handled?  The only way I can think of right now is by writing a SQL query.  If that is the only route, does anyone already have the SQL query written that they can share? (I can hunt and peck for the tables otherwise.)  The specific counter I'm looking to override is the Total Latency.  Thank you.