Storage systems, like any other network or server hardware, are likely to brew up bottlenecks and performance issues and it’s the storage administrator’s job to keep this in check and triage issues. It’s a misconception that all storage bottlenecks arise due to storage disks. There are other key components of the storage infrastructure such as the storage controller, FC switches, and front-end ports that could go off course and, in turn, impact storage performance.

In this blog, we’ll understand some important factors causing performance bottlenecks in disk array controllers (aka RAID controllers).

 

What is a Disk Array Controller?

A disk array controller is a device which manages the physical disk drives and presents them to the computer as logical units. It almost always implements hardware RAID, and thus is sometimes referred to as RAID controller. It also often provides additional disk cache.[1]

The disk array controller is made of up 3 important parts which play a key role in the controllers functioning and also show us indicators of storage I/O bottlenecks. These are:

  • CPU that processes the data sent to the controller
  • I/O port that includes:
    • Back-end interface to establish communication with the storage disks
    • Front-end interface to communicate with a computer's host adapter
  • Software executed by the controller's processor which also consumes the processor resources

 

These components could potentially offset the performance of the storage subsystem when left unchecked. There are third-party storage management tools that help to get this visibility, but as storage administrators you should know what metrics to look at to understand what could possibly go wrong with the disk array controller.

 

Common Causes Disk Array Controller Bottlenecks

#1 Controller Capacity Overload: It is possible that the disk array controller is made to support more resources than it can practically handle. Especially when in scenarios of thin provisioning, automated tiering, snapshots, etc. the controller is put through capacity overload and this may impact the storage I/O operations. Also, when we are having to execute operations such as deduplication and compression, they may just add more load on the controller.

 

#2 Server Virtualization & Random I/O Workloads: Thanks to server virtualization, there are more workloads on the disk array controller to in comparison to the single application load on the host in the past. This makes it more difficult for the storage controller to find the data each virtual machine is requesting when each host has a steady stream of random I/O depending on each connecting host supporting multiple workloads.

 

Key Metrics to Monitor Disk Array Controller Bottlenecks

#1 CPU Utilization: You need to monitor the CPU utilization of the disk array controller with great depth and visibility. Try to get CPU utilization data during peak load times and analyze what is causing the additional load and whether the storage controller is able to cope with the processing requirements.

  

#2 I/O Utilization: It’s also important to monitor I/O utilization metrics of the controller in 2 respects:

  • From the host to the controller
  • From the controller to the storage array

 

Both these metrics allow you to figure out when the disk array controller has excessive CPU utilization or if one of the I/O bandwidths is overshooting. Then, you can understand whether the storage controller is able to meet the CPU capacity and I/O bandwidth demand with the available resource specification.

 

As George Crump, President of Storage Switzerland, recommends on TechTarget, you can address storage controller bottlenecks by

  • Increasing processing power of the controller CPU
  • Using more advanced storage software
  • Making the processor more efficient by implementing task-specific CPUs. This allows you to move portions of code to silicon or a field-programmable gate array (FPGA) enabling those sections of code to execute faster and the system to then deliver those functions without impacting overall performance.
  • Leveraging the hypervisor within server and/or desktop virtualization infrastructures to perform more of the data services tasks such as thin provisioning, snapshots, cloning and even tiering.
  • Using scale-out storage which is to add servers (often called nodes) to the storage system where each node includes additional capacity, I/O and processing power.