Geek Speak

3 Posts authored by: jonklaus

A storage system on its own is not useful. Sure, it can store data, but how are you going to put any data on it? Or read back the data that you just stored? You need to connect clients to your storage system. For this post, let’s assume that we are using block protocols like iSCSI or traditional block storage systems. This article also applies to file protocols (like NFS and SMB) and to some extent even to hyper-converged infrastructure, but we will get back to that later.

 

Direct attaching clients to the storage system is an option. There is no contention between clients on the ports, and it is cheap. In fact, I still see direct attached solutions in cases where low cost wins over client scalability. However, direct attaching your clients to a storage system does not really scale well in number of clients. Front-end ports on a storage array are expensive and limited.

 

Add some network

Therefore, we add some sort of network. For block protocols, that is a SAN. The two most common used protocols are the FC protocol (FCP) and iSCSI. Both protocols use SCSI commands, but the network equipment is vastly different: FC switches vs. Ethernet switches. Both have their advantages and disadvantages, and IT professionals will usually have a strong preference for either of the two.

 

Once you have settled for a protocol, the switch line speed is usually the first thing that comes up. FC commonly uses 16Gbit and 32Gbit switches that have been entering the market lately. Ethernet, however, is making bigger jumps, with 10Gbit being standard within a rack or wiring closet and 25/40/100Gbit commonly used for uplinks to the data center cores.

 

The current higher speeds of Ethernet networks are often one of the arguments why “Ethernet is winning over FC.” 100Gbit Ethernet has already been on the market for quite some time, and the next obvious iteration of FC is “only” going to achieve 64Gbit.

 

Oversubscription

Once you start attaching more clients to a storage system than it has storage ports, you start oversubscribing. 100 servers attached to 10 storage ports means you have on average 10 servers on each storage port. Even worse, if those servers are hypervisors running 30 virtual machines each, you will now have 300 VMs competing for resources on a single port.

 

Even the most basic switch will have some sort of bandwidth/port monitoring functionality. If it does not have a management GUI that can show you graphs, third-party software can pull that data out of the switch using SNMP. As long as traffic in/out does not exceed 70% you should be OK, right?

 

The challenge is that this is not the whole truth. Other, more obscure limitations might ruin your day. For example, you might be sending a lot of very small I/O to a storage port. Storage vendors often brag with 4KB I/O performance specs. 25,000 4KB IOps only accounts for roughly 100MB/s or 800Mbit (excluding overhead). So, while your SAN port shows a meager 50% utilization, your storage port or HBA could still be overloaded.

 

It becomes more complex once you start connecting SAN switches and distributing clients and storage systems across this network of switches. It is hard to keep track of how much storage and client ports traverse the ISLs (Inter Switch Links). In this case, it is a smart move to keep your SAN topology simple and to be careful with oversubscription ratios. Do the oversubscription math, and look beyond the standard bandwidth graphs. Check error counters, and in an FC SAN that has long distance links, check whether the Buffer-to-Buffer credits deplete on a port.

 

Ethernet instead of FC

The same principles apply to Ethernet. One argument why a company chooses an Ethernet-based SAN is because it already has LAN switches in place. In these cases, be extra vigilant. I am not opposed to sharing a switch chassis between SAN and normal client traffic. However, ports, ISLs, and switch modules/ASICS are prime contention points. You do not want your SAN performance to drop because a backup, restore, or large data transfer starts between two servers, and both types of traffic start fighting for the available bandwidth.

 

Identically, hyper converged infrastructure solutions like VxRail and other VMware VSAN place high demands on the Ethernet uplinks. Ideally, you would want to ensure that VMware VSAN uses dedicated, high-speed uplinks.

Which camp are you in? FC or Ethernet, or neither? And how do you ensure that the SAN doesn’t become a bottleneck? Comment below!

At some point, your first storage system will be “full." I’m writing it as “full” because the system might not actually be 100% occupied with data at that exact point in time. The system could be full for another technical reason. For example, shared components in a system (e.g. CPUs) are overloaded before you ever install the maximum amount of drives, and upgrading those would be too expensive. Or it could be an administrative decision that has made you decide to not hand out new capacity from an existing system. For example, you’re expecting a rapid organic growth of several thin provisioned volumes, which would soon fully utilize the capacity headroom of the current system.

 

The fact that a single system has reached the maximum capacity, either for technical or administrative purposes, does not mean you need to turn away customers. IT should be a facilitator to the business. If the business needs to store additional data, there’s often a good reason for it. In health care it could be storing medical images. For a service/cloud provider, hosting more (paying) customers. So instead of communicating “Sorry, we’re full, go somewhere else!”, we should say something in the direction of “Yes, we can store your data, but it’s going to land on a different system”. In fact, just store the data and leave out the system part!

 

More of the same?

When your first system is full and you’re buying another one, you could buy a similar system and install it next to the original one. It might be a bit faster, or a bit more tuned for capacity. Or completely identical, if you were happy with the previous one.

 

On the other hand, this might also be a good moment to differentiate between the types of data in your company. For example, if you’ve started out with a block storage, maybe this is the time to buy a NAS and offload some of the file data to it.

 

Regardless of type, introducing a second system will create a couple of challenges for the IT department. First, you’ll now have to decide which system you want to land new data on. With identical type systems, it might be a fill and spill principle where you fill up the first system and then move over to the second box.

 

Once you introduce different types and speeds of systems though, you need to differentiate between types of storage and the capabilities of systems. Some data might be better suited to land on a NAS, other data on a spinning disk SAN, and another flavor of data on an all-flash SAN array. And you need to keep track which clients/devices are attached to which systems, so documentation and a clear naming convention is paramount.

 

Keeping it running

Then there’s the challenge of keeping all the storage systems running. You can probably monitor a handful of systems with the in-box GUI, but that doesn’t scale well. At some point, you need to add at least central monitoring software, to group all the alerts and activities in a single user interface. Even better would be central management, so you don’t have to go back to the individual boxes to allocate LUNs and shares.

 

With an increasing number of storage systems comes an increasing number of attached servers and clients. Ensuring that all clients, interconnects and systems are on the right patch levels is a vertical task across all these layers. You should look at the full stack to ensure you don’t break anything by patching it to a newer level.

 

If you glue too many systems together, you’ll end up with a spaghetti of shared systems that make patch management difficult, if not impossible. Some clients will be running old software that prevents other layers (like the SAN or storage array) from being patched to the newest levels. Other attached clients might rely on these newer codes because they run a newer hypervisor. You’ll quickly end up with a very long string of upgrades that need to be performed, before you’re fully up to date and compliant. So, it’s probably best to create building blocks of some sort.

 

How do you approach the “problem” of growing data? Do you throw more systems at it, or upgrade capacity/performance of existing systems? And how do you ensure that the infrastructure can be managed and patched? Let me know!

When designing the underlying storage infrastructure for a set of applications, several metrics are important.

 

First, there’s capacity. How much storage do you need? This is a metric that’s well understood by most people. People see GBs and TBs on their own devices and subscription plans on a daily basis, so they’re well aware of it.

 

There’s also performance, which is a bit more difficult. People tend to think in terms of “slow vs. fast," but these are subjective metrics. For storage, the most customer-centric metric is response time. How long does it take to process a transaction? Response time is, however, the product of a few other metrics, including I/O operations per second, the size of an I/O, and the queue depth of other I/O in front of you.

 

Sizing a storage system

If you size a storage system to meet both capacity and peak performance requirements, you will generally have low response times. Capacity is easy; I need X Terabytes. Ideally, you’d also have some performance numbers to base the size of your system on, including expected IOps, I/O size, and read:write ratio to name a few. If you don’t have these performance requirements, a guesstimate is often the closest you can get.

 

With this information, and an idea of which response time you’re aiming for, it’s possible to configure a system that should be in the sweet spot. Small enough to make it cost effective, yet large enough that you can absorb some growth and/or unexpected peaks in performance and capacity. Depending on your organization and budget, you might undersize it to only cover the 95th percentile peak performance, or you might oversize it to facilitate growth in the immediate future.

 

Let it grow, let it grow… and monitor it!

Over time though, your environment will start to grow. Data sets increase and more users connect to it. Performance demands grow in step with capacity. This places additional demands on the system; demands that it wasn’t sized for initially.

 

Monitoring is crucial in this phase of the storage system lifecycle. You need to accurately measure the capacity growth over time. Automated forecasts will help immensely. Keep an eye on the forecasting algorithms and the statistics history. If the algorithm doesn’t use enough historical data, it might result in extremely optimistic or pessimistic predictions!

 

Similarly, performance needs to be guaranteed throughout the life of the array. The challenge with performance monitoring is that it’s usually a chain of components that influence each other. Disks connect to busses, which connect to processors, which connect to front-end ports, and you need to monitor them all. Depending on the component that’s overloaded, you might be able to upgrade it. For example, connect additional front-end ports to the SAN or upgrade the storage processors. At some point though, you’re going to hit a limit. Then what?

 

Failure domain

Fewer, larger systems have several advantages over multiple smaller arrays. There are fewer systems to manage, which saves you time in monitoring and day-to-day maintenance. Plus, there’s fewer losses, as silos tend to not be fully utilized.

 

One important aspect to consider, though, is the failure domain. What's the impact if a system or component fails? Sure, you could grow your storage system to the largest possible size. But if it fails, how long would you need to restore all that data? In a multi-tenancy situation, how many customers would be impacted by a system failure? Licenses for larger systems are sometimes disproportionally more expensive than their smaller cousins; does this offset the additional hassle of managing multiple systems? There’s multiple approaches possible. Let me know which direction you’d choose: fewer, bigger systems, or multiple smaller systems!

Filter Blog

By date: By tag:

SolarWinds uses cookies on its websites to make your online experience easier and better. By using our website, you consent to our use of cookies. For more information on cookies, see our cookie policy.