Skip navigation

Product Blog

3 Posts authored by: james_honey Employee

In the previous two posts, we talked about high level performance information and then we dove into the details around storage performance from the array, pool, and LUN/Volume detail. Now let's talk about thresholds and alerting. This is where we start making Storage Resource Monitor adapt to your environment, while also showing what performance information matters to you. 

 

Thresholds

Setting thresholds is a key step in making sure your data center runs efficiently. When you start SolarWinds® Storage Resource Monitor the first time, there are pre-set thresholds setup based on general best practices. For most situations this will work, however there are solutions that require something a little more specific. There are applications in your environment that require low latency and if any of them deviate from that it would cause major headaches. There are other applications that require a specific amount of IOPS and any dip will slow the business down and lead to your inbox being filled with not so nice requests for information. Having your thresholds set properly can help you avoid "fire drills." The "SRM Settings" section is where you can set global thresholds for key storage resources.

Thresholds can be set for IOPS, throughput, I/O size, Capacity, and latency (LUN & Volume specific).  In addition, some of these can be set by read, write, or total so you can even customize for applications that are heavy on read or heavy on write performance. 

Using global settings allows you to tailor monitoring for your data center, but, as you know, there are also applications that differ from the others that need special attention.  If that’s the case, Storage Resource Monitor has you covered. Under each details screen (array, pool, and LUN/Volume), you can adjust the thresholds for that specific resource. Pool 1 needs to maintain 500 IOPS and I need to know when it goes below it. You can set the threshold to warning when IOPS are less than or equal to 600 and critical when IOPS are less than or equal to 550. LUN 2 has to maintain latency of 50ms. You can set the threshold to warning when it hits 40ms and critical when it hits 50ms. The thresholds you set for the individual resources will translate to the summary screens we talked about before, so at-a-glance you can see if the required performance needs are being met.

 

Alerting

So now you’re thinking, "thresholds are great, but if something happens when the custom thresholds are  reached I need to be alerted."  In addition to custom thresholds, setting custom alerts will make sure you know when something goes wrong quickly. Like before, the standard alerts in Storage Resource Monitor will get you going, however custom alerts help make sure you understand if all of your resources are performing as required. Creating custom alerts can be done for groups of resources with the same performance profile or for specific resources that have a very unique requirement.

 

You can set a single alert for a specific storage resource or set an alert for multiple resources that share a common performance profile. There is the ability to customize everything from a specific team to handle the alert, to setting that the condition has to exist for a period of time, and even the ability to set the alert to only be enable during a certain time of day to name a few. Setting a custom alert for a specific time helps avoid the unwanted alerting noise during expected downtime and/or planned degraded performance. 

 

 

By using thresholds and custom alerts, Storage Resource Monitor has you covered when monitoring storage performance for all your applications. Along with dashboards and storage resource details, you can easily stay ahead of your storage performance needs and track when more resources are needed.

 

What are some of your best practices around thresholds? What are the items you customize with alerts?

Having a high-level view of storage performance is good for a quick overview or understanding of how things are operating.  In order to take your monitoring to the next level, having access to details is critical. In my previous post,  I reviewed storage dashboards and performance data points that SolarWinds Storage Resource Monitor provides.  Below I will cover performance monitoring at the array, storage pool, and LUN/Volume level.

The "Array Details" screen is usually the first stop when looking at your storage performance. This is a great starting point for when you want to get a look at the overall performance for a storage array. Having this information is ideal when you want to compare the expected performance of an array versus how the array is actually performing.  In addition, you can get an understanding of read/write performance ratios in relation to the overall performance.

 

The “Block Storage” and “File Storage” tabs allow you to quickly get into the underlying performance information for the device’s storage pools and LUNS/Volumes.  Each of these tabs will show you latency summaries and performance summaries for the individual resources.  At-a-glance, this will let you see if you have any latency issues at the LUN/Volume level and what your highest performing LUN/volumes are by IOPS, throughput, or latency.

 

 

"Storage Pool Details” provide storage administrators the ability to understand performance at a pool/RAID level.  Depending on how storage resources are assigned out to applications, this can provide the ability to understand performance for similar applications.  An example would be a VM farm is created for different instances of the same application.  Having the applications tied to the same pool of storage with different LUNs is ideal so that you have the same pattern of read/write ratio and not running into instances where different read/write ratios are involved.  This can cause application performance problems if the disk is having to store random data in one instance and then sequential the next.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

The "LUN & Volume Details" screen is where you can see performance at the lowest level.  This is where you can tie application performance directly to the assigned storage. In addition, this is where the power of Storage Resource Monitor really comes into play.  Not only can you see the individual LUN performance, you can also see it in relation to other LUNs in the same storage pool.  Did a LUN in the same pool spike performance?  Are all the LUNs in the same pool experiencing high latency?  These are a couple of questions the LUN Details screen can help answer.

 

 

As you can see, the more in-depth you go with Storage Resource Monitor, the more information and comparisons become available.  All of the information presented is critical to understanding your storage performance and how it affects your overall environment.  In my next post, I will cover thresholds & alerting and how with the right settings & planning you can make Storage Resource Monitor not just an important monitoring tool, but a critical one.

 

How have you used the details screens to monitor and troubleshoot your storage performance?

Managing storage is a constant dance of making sure resources are available for the applications that need them, and making sure resources are constantly in use, because having wasted resources in addition to no resources can be problem. SolarWinds® Storage Resource Monitor helps make this dance a little less complicated. Over the next few posts I am not only going to show different parts of Storage Resource Monitor in relation to storage performance, but also how each of these parts can give you the information you need to monitor your environment and maximize one of your largest IT investments.

 

To start, we will address some basic information regarding storage performance and how Storage Resource Monitor presents the data. Based on customer feedback, one of the best things about SRM is that users are able to quickly view and understand their storage performance problems. Below, I will show you what initial performance information SRM provides, and ways to interpret the data. Depending on your environment, there will always be different ways to interpret performance data, so your mileage will vary.

 

Here we have part of the SRM Summary screen. In one simple view you get a list of storage devices being monitored, alerts, events, and performance and capacity summaries. The All Storage Objects widget will not only show you all the storage devices, but also point to devices that are having problems using easy-to-see green, yellow, and red notifications. To get to the exact cause, you can drill down into the array date until you get to the specific storage resource with the problem. A faster way to recognize performance problems is with either All Active Alerts or Storage Objects by Performance Risk.

                        

 

The Storage Objects by Performance Risk will give you a summary of performance problems based and sorted by latency. Like most things, high latency is not an ideal situation. However, the definition of "high" varies by environment and application. In addition to latency, IOPS and throughput are shown, and you can tailor the thresholds for the resources to be more specific to your requirements. Using this allows you to select your top performance problems by latency at the main screen without any digging. 

 

                             

In addition to the performance information on the SRM Summary screen, the Performance Dashboard lets you see additional performance data points. It includes the performance objects by risk and information for LUNs by Performance and NAS Volumes by Performance. Any of these sections will allow you to instantly dig into the specific storage resource that is experiencing performance problems.

              

 

This data allows you to instantly address performance problems. To see overall performance at the array and/or storage pool level, SRM gives you access to that data in a mere one or two clicks.  For array-specific performance information, select an array in the All Storage Objects section and the Array Details screen will show detailed information for that array. Clicking once more in the All Storage Objects section will show the storage pools and allow you to select the Storage Pool Details screen for each pool. Going even lower will show all the LUNs assigned to each pool.  Selecting a LUN will bring up the LUN Details screen.   Each of these screens will present specific performance information as it relates to that storage resource.

 

Array Details

 

Storage Pool Details

 

LUN Details

 

Now, what do these high-level performance views do for the end-user? Right from the start, you can instantly discover, identify, and start troubleshooting performance problems. The goal is that the critical problems are up front, and the need to check each storage device one by one for problems is eliminated. In addition, having the ability to customize the dashboards and information is critical to tailoring the monitoring to your needs.

 

My next post will cover the three specific layers we use to help you monitor your storage performance: array, storage pool, and LUN/volume.

 

I would love to hear your feedback about how SRM has helped you monitor your storage performance. Please leave comments and questions below.

Filter Blog

By date: By tag: