Storage Manager relies on querying the array vendor's SMI-S providers for any and all data pertaining to the array. The Storage Manager development team researches new metrics that are added by the vendors in the new versions of their providers in order to expose these to our end-users through the Storage Manager GUI. As I mentioned in the release blog post about Storage Manager 5.3.1, we added a ton of new metrics for IBM SVC, EMC VMAX (including Latency!), and EMC VNX. I'd like to take a little time in this post to outline these new metrics.

 

IBM SVC

 

The IBM SVC architecture presents a front-end interface to virtualize the storage arrays that are sitting behind the SVC. Key to making this architecture work is the ability to efficiently manage the cache, especially the write cache. This enables hosts to get acknowledgement that a write has completed without needing to wait for that write to be destaged to back-end storage. Monitoring the efficiency of the cache is therefore important to managing the overall responsiveness of the entire SVC storage system.

 

Cache Performance Metrics

 

These cache metrics can all be found in the Node Performance report under the Performance tab

SVC1.PNG

 

The following statistics are collected for the overall cache on a per node basis:

 

  • Cache CDCB's - Demote Ready List - Current count of the number of cache directory control blocks (CDCB’s) on the demote ready list
  • Cache CDCB's - Global Copies List - Current count of the number of CDCB’s on the global copies list
  • Cache CDCB's - Modified List - Current count of the number of CDCB’s on the modified list including cache partitions
  • Average Write Cache Destage Latency - Write cache destage latency average in milliseconds for the statistics collection period
  • Lowest Write Cache Destage Latency -  Lowest write cache destage latency in milliseconds for the statistics collection period
  • Highest Write Cache Destage Latency - Highest write cache destage latency in milliseconds for the statistics collection period
  • Average Prestage/Readahead Latency - Prestage/read-ahead latency average in milliseconds for the statistics collection period
  • Lowest Prestage/Readahead Latency - Lowest prestage/read-ahead latency in milliseconds for the statistics collection period
  • Highest Prestage/Readahead Latency - Highest prestage/read-ahead latency in milliseconds for the statistics collection period
  • Average Read Cache Stage Latency - Read cache stage latency for read miss IO’s average in milliseconds for the statistics collection period
  • Lowest Read Cache Stage - Lowest read cache stage latency for read miss IO’s in milliseconds for the statistics collection period
  • Highest Read Cache Stage - Highest read cache stage latency for read miss IO’s in milliseconds for the statistics collection period
  • Average Cache Fullness - Average cache fullness in percent for the statistics collection period
  • Lowest Cache Fullness - Lowest cache fullness in percent for the statistics collection period
  • Highest Cache Fullness - Highest cache fullness in percent for the statistics collection period
  • Average Write Cache Fullness - Average write cache fullness in percent for the statistics collection period
  • Lowest Write Cache Fullness -  Lowest write cache fullness in percent for the statistics collection period
  • Highest Write Cache Fullness - Highest write cache fullness in percent for the statistics collection period
  • Average Data Transfer Latency - Data transfer average latency in milliseconds for the statistics collection period.
  • Lowest Data Transfer Latency - Data transfer lowest latency in milliseconds for the statistics collection period.
  • Highest Data Transfer Latency - Data transfer highest latency in milliseconds for the statistics collection period.
  • Average Track Access Latency - Begin track access average latency in milliseconds for the statistics collection period.
  • Lowest Track Access Latency - Begin track access lowest latency in milliseconds for the statistics collection period.
  • Highest Track Access Latency - Begin track access highest latency in milliseconds for the statistics collection period
  • Average Track Lock Latency -  Track lock average latency in milliseconds for the statistics collection period.
  • Lowest Track Lock Latency - Track lock lowest latency in milliseconds for the statistics collection period.
  • Highest Track Lock Latency - Track lock highest latency in milliseconds for the statistics collection period.

 

CPU

 

Monitoring the CPU on any array can be important to understand if the array itself can manage the load it is being put under by the hosts. The CPU on the IBM SVC will always read 100% if monitored from the OS because it spends its spare cycles scanning the Fibre Channel fabric. IBM has presented the following counter that allow you see how much time the CPU is actually spending servicing IO requests.

 

CPU Utilization - This statistic reports the pseudo-CPU utilization. It can also be found under Node Performance with the Cache Performance statistics.

 

Port Performance

 

1-28-2013 5-59-52 PM.png

The following stats are reported per nodes for each of the 4 ports of an SVC node and can be found under the Port Performance report under the Performance tab:

 

Commands Initiated to Controllers - Commands initiated to controllers (targets) [always zero but provided for completeness]

Commands Initiated to Hosts - Commands initiated to hosts (initiators)

Commands Received from Controllers - Commands received from controllers (targets) [probably always zero but provided for completeness]

 

The following stats are provided primarily for debug of suspected fabric issues. Each of the statistics below (except for Zero Buffer-Buffer Credit Timer) are the cumulative number of occurrences since last node reset.

 

Link Failure Count

Loss-of-synchronization Count

Loss-of-signal Count

Primitive Sequence Protocol Error Count

Invalid Transmission Word Count

Invalid CRC Count

Zero Buffer-Buffer Credit Timer - This is reset after each collection interval. This should be shown as a percentage of the interval time, so (bbcz/(interval in microseconds))*100

 

MDisk Group Performance and Utilization

 

 

IBM SVC MDisk groups are a collection of MDisks that are all tied to a set of associated VDisks. They are much like Storage Pools in other array architectures. Understanding the load on the entire MDisk group can be incredibly important when tracking down issues in your storage environment. This is because although one VDisk may not be busy, it can easily be starved by a "noisy neighbor." Without a full understanding of what MDisks (and VDisks) are sitting on shared resources, it can be nearly impossible to make those connections.

1-29-2013 10-33-23 AM.png

In Storage Manager 5.3.1, we created new performance reports specifically for reporting on the MDisk Group. Within the MDisk Group Performance chart you can report on all of the standard performance metrics for that group including:

  • Overall Response Time
  • Read Block
  • Read Blocks/Sec
  • Read IOs
  • Read IOs/Sec
  • Read Response Time
  • Total IOs
  • Total IOs/Sec
  • Worst Read Response Time
  • Worst Write Response Time
  • Write Blocks
  • Write Blocks/Sec
  • Write IOs
  • Write IOs/Sec
  • Write Response Time

 

We also added a specific report and charts for MDisk Group Utilization to track capacity information for the MDisk Group.

 

1-29-2013 10-46-45 AM.png

1-29-2013 10-47-22 AM.png

The MDisk Group Utilization reports allow you to track the following key capacity metrics:

  • % Free
  • % Used
  • Free (GB)
  • Total (GB)
  • Used (GB)

 

For a detailed overview of all the performance statistics provided by the IBM SVC SMI-S Provider, please see this document provided by IBM.

 

EMC Symmetrix VMAX

 

For VMAX, we added metrics for Device Performance and for Disk Performance.

 

Device Performance

 

One concern we've received from customers in the past was the inability to get latency metrics from VMAX LUNs. As of Storage Manager 5.3.1, we now incorporate read and write latency metrics for LUNs. These do not yet roll up into a top-level metric in the Top 10 LUNs report on the Main Console, but we will be looking at addressing that in future releases.

 

1-29-2013 12-37-42 PM.png

The two latency metrics we've added for read and write latency are:

  • Samples Average Writes Time (ms) and
  • Samples Average Reads Time (ms)

 

Hopefully, customers will find this information extremely valuable as they troubleshoot latency problems on their VMAX arrays.

 

Disk Performance

 

1-29-2013 12-30-12 PM.png

In addition to the existing I/O metrics, we added these metrics:

  • MB Read
  • MB Read/sec
  • MB Written
  • MB Written/sec
  • MB Transferred
  • MB Transferred/sec

 

EMC VNX

 

We actually did quite a bit of work to add new metrics for VNX FAST Cache in Storage Manager 5.2.4. However, because 5.2.4 was a service release, we didn't make a lot of noise about the new metrics. This doesn't mean that they are unimportant though! VNX Fast Cache is an incredibly useful technology for helping to improve the overall responsiveness of your storage subsystem and therefore it's important to know how your cache is performing. FAST Cache metrics added in Storage Manager 5.2.4 include:

 

FAST Cache Metrics

 

Asset Info Report - Cache High/Low Water Mark

 

1-29-2013 1-22-13 PM.png

Asset - FAST Cache (new report)

 

1-29-2013 1-27-25 PM.png

 

Cache Enabled (Shows on 2 reports)

 

Storage - LUN's Report

 

1-29-2013 1-23-36 PM.png

Storage - Storage Pool/RAID Group Utilization

 

1-29-2013 1-25-48 PM.png

Also, we added performance metrics for Cache to multiple Performance charts in the product including:

 

Array Performance

1-29-2013 1-31-29 PM.png

Here we added support for these metrics:

  • FAST cache MB Flushed SPA
  • FAST cache MB Flushed SPA/sec
  • FAST cache MB Flushed SPB
  • FAST cache MB Flushed SPB/sec
  • %Dirty FAST Cache SPA
  • %Dirty FAST Cache SPB

 

LUN Performance

 

1-29-2013 1-43-22 PM.png

Here we added support for these metrics:

  • FAST Cache Read Hits
  • FAST Cache Read Hits/sec
  • FAST Cache Read Misses
  • FAST Cache Read Misses/sec
  • FAST Cache Write Hits
  • FAST Cache Write Hits/sec
  • FAST Cache Write Misses
  • FAST Cache Write Misses/sec
  • % FAST Cache Read Hits
  • % FAST Cache Write Hits

 

Storage Pool/RAID Group Performance

 

1-29-2013 1-46-59 PM.png

  • FAST Cache Read Hits
  • FAST Cache Read Hits/sec
  • FAST Cache Read Misses
  • FAST Cache Read Misses/sec
  • FAST Cache Write Hits
  • FAST Cache Write Hits/sec
  • FAST Cache Write Misses
  • FAST Cache Write Misses/sec

 

For a full overview of all the metrics available through the EMC SMI-S Provider, please reference these two guides provided by EMC:

EMC® SMI-S Provider Version 4.4.0 Programmer’s Guide with Additional Extensions

EMC SMI-S Provider Version 4.5 Programmer’s Guide