Flash storage can be really, really fast. Crazy fast. So fast that some have openly asked if they really need to worry about storage performance anymore. After all, once you can throw a million IOPS at the problem, your bottleneck has moved somewhere else!


So do you really need to worry about storage performance once you go all-flash?


Oh yes, you definitely do!


All-Flash Storage Can Be Surprisingly Slow


First, most all-flash storage solutions aren't delivering that kind of killer performance. In fast, most all-flash storage arrays can push "only" tens of thousands of IOPS, not the millions you might expect! For starters, those million-IOPS storage devices are internal PCIe cards, not SSD's or storage arrays. So we need to revise our IOPS expectations downwards to the "hundred thousand or so" than an SSD can deliver. Then it gets worse.


Part of this is a common architectural problem found in all-flash storage arrays which I like to call the "pretend SSD's are hard disks" syndrome. If you're a vendor of storage systems, it's pretty tempting to do exactly what so many of us techies have done with our personal computers: Yank out the hard disk drives and replace them with SSD's. And this works, to a point. But "storage systems" are complex machines, and most have been carefully balanced for the (mediocre) performance characteristics of hard disk drives. Sticking some SSD's in just over-taxes the rest of the system, from the controller CPU's to the I/O channels.


But even storage arrays designed for SSD's aren't as fast as internal drives. The definition of an array includes external attachment, typically over a shared network, as well as redundancy and data management features. All of this gets in the way of absolute performance. Let's consider the network: Although a 10 Gb Ethernet or 8 Gb Fibre Channel link sounds like it would be faster than a 6 Gb SAS connection, this isn't always the case. Storage networks include switches (and sometimes even routers) and these add latency that slows absolute performance relative to internal devices. The same is true of the copy-on-write filesystems protecting the data inside most modern storage arrays.


And maximum performance can really tax the CPU found in a storage array controller. Would you rather pay for a many-core CPU so you'll get maximum performance or for a bit more capacity? Most storage arrays, even specialized all-flash devices, under-provision processing power to keep cost reasonable, so they can't keep up with the storage media.


Noisy Neighbors


Now that we're reset our expectations for absolute performance, let's consider what else is slurping up our IOPS. In most environments, storage systems are shared between multiple servers and applications. That's kind of the point of shared networked storage after all. Traditionally, storage administrators have carefully managed this sharing because maximum performance was naturally quite limited. With all-flash arrays, there is a temptation to "punt" and let the array figure out how to allocate performance. But this is a very risky choice!


Just because an array can sustain tens or even hundreds of thousands of I/O operations per second doesn't mean your applications won't "notice" if some "noisy neighbor" application is gobbling up all that performance. Indeed, performance can get pretty bad since each application can have as much performance as it can handle! You can find applications starved of performance and trudging along at disk speeds...


This is why performance profiling and quality of service (QoS) controls are so important in shared storage systems, even all-flash. As an administrator, you must profile the applications and determine a reasonable amount of performance to allocate to each. Then you must configure the storage system to enforce these limits, assuming you bought one with that capability!


Note that some storage QoS implementations are absolute, while others are relative. In other words, some arrays require a hard IOPS limit to be set per LUN or share, while others simply throttle performance once things start "looking hot". If you can't tolerate uneven performance, you'll have to look at setting hard limits.


Tiered Flash


If you really need maximum performance, tiered storage is the only way to go. If you can profile your applications and segment their data, you can tier storage, reserving maximum-performance flash for just a few hotspots.


Today's hybrid storage arrays allow data to be "pinned" into flash or cache. This delivers maximum performance but can "waste" precious flash capacity if you're not careful. You can also create higher-performance LUNs or shares in all-flash storage arrays using RAID-10 rather than parity or turning off other features.


But if you want maximum performance, you'll have to move the data off the network. It's pretty straightforward to install an NVMe SSD in a server directly, especially the modern servers with disk-like NVMe slots or M.2 connectors. These deliver remarkable performance but offer virtually no data protection. So doing this with production applications puts data at risk and requires a long, hard look at the application.


You can also get data locality by employing a storage caching software product. There are a few available out there (SanDisk FlashSoft, Infinio, VMware vFRC, etc) and these can help mitigate the risks of local data by ensuring that writes are preserved outside the server. But each has its own performance quirks, so none is a "silver bullet" for performance problems.


Stephen's Stance


Hopefully I've given you some things to think about when it comes to storage performance. Just going "all-flash" isn't going to solve all storage performance problems!


I am Stephen Foskett and I love storage. You can find more writing like this at blog.fosketts.net, connect with me as @SFoskett on Twitter, and check out my Tech Field Day events.