cancel
Showing results for 
Search instead for 
Did you mean: 
Create Post

When is a storage system full?

Level 9

When designing the underlying storage infrastructure for a set of applications, several metrics are important.

First, there’s capacity. How much storage do you need? This is a metric that’s well understood by most people. People see GBs and TBs on their own devices and subscription plans on a daily basis, so they’re well aware of it.

There’s also performance, which is a bit more difficult. People tend to think in terms of “slow vs. fast," but these are subjective metrics. For storage, the most customer-centric metric is response time. How long does it take to process a transaction? Response time is, however, the product of a few other metrics, including I/O operations per second, the size of an I/O, and the queue depth of other I/O in front of you.

Sizing a storage system

If you size a storage system to meet both capacity and peak performance requirements, you will generally have low response times. Capacity is easy; I need X Terabytes. Ideally, you’d also have some performance numbers to base the size of your system on, including expected IOps, I/O size, and read:write ratio to name a few. If you don’t have these performance requirements, a guesstimate is often the closest you can get.

With this information, and an idea of which response time you’re aiming for, it’s possible to configure a system that should be in the sweet spot. Small enough to make it cost effective, yet large enough that you can absorb some growth and/or unexpected peaks in performance and capacity. Depending on your organization and budget, you might undersize it to only cover the 95th percentile peak performance, or you might oversize it to facilitate growth in the immediate future.

Let it grow, let it grow… and monitor it!

Over time though, your environment will start to grow. Data sets increase and more users connect to it. Performance demands grow in step with capacity. This places additional demands on the system; demands that it wasn’t sized for initially.

Monitoring is crucial in this phase of the storage system lifecycle. You need to accurately measure the capacity growth over time. Automated forecasts will help immensely. Keep an eye on the forecasting algorithms and the statistics history. If the algorithm doesn’t use enough historical data, it might result in extremely optimistic or pessimistic predictions!

Similarly, performance needs to be guaranteed throughout the life of the array. The challenge with performance monitoring is that it’s usually a chain of components that influence each other. Disks connect to busses, which connect to processors, which connect to front-end ports, and you need to monitor them all. Depending on the component that’s overloaded, you might be able to upgrade it. For example, connect additional front-end ports to the SAN or upgrade the storage processors. At some point though, you’re going to hit a limit. Then what?

Failure domain

Fewer, larger systems have several advantages over multiple smaller arrays. There are fewer systems to manage, which saves you time in monitoring and day-to-day maintenance. Plus, there’s fewer losses, as silos tend to not be fully utilized.

One important aspect to consider, though, is the failure domain. What's the impact if a system or component fails? Sure, you could grow your storage system to the largest possible size. But if it fails, how long would you need to restore all that data? In a multi-tenancy situation, how many customers would be impacted by a system failure? Licenses for larger systems are sometimes disproportionally more expensive than their smaller cousins; does this offset the additional hassle of managing multiple systems? There’s multiple approaches possible. Let me know which direction you’d choose: fewer, bigger systems, or multiple smaller systems!

22 Comments
MVP
MVP

Good article

Today we are buying storage 1.5 Petabytes at a time, a couple of times each year.  It's part of being in Health Care, having to store everyone's medical and financial records for their entire life, plus twenty years.  Ensuring access to those records one hundred years from now means continually migrating from technology to technology, hardware and software platform vendor by vendor.

I don't know of any part of this environment where it would be acceptable or even possible to say "The storage is full."

Level 20

I've also noted that many storage systems start to really perform badly when not even close to being fulling utilized.

Level 11

Good article.  I would also suggest monitoring up front when possible to right size the environment as discussed.  When I used to work for an MSP we would often during the initial sales engineering engagements offer to throw monitoring on their existing infrastructure for a couple of weeks to 30 days.  This was very helpful and several times the true data and initial estimates were way off. 

Level 12

We have gone through this struggle for the last 3 years now. Our current SAN is literally full, and has been for the last 3 years. When the SAN was purchased it met our storage needs, but then suddenly we had massive expansion of data across a lot of different layers. A lot of this was due to a big push to get most of our servers from physical to virtual. Our current SAN is about 35TB usable, and it takes up an entire rack and its all spinning disk. Our new SAN that we just installed in the new building last week is 2U and 100TB usable SSD. We literally tripped our storage capacity, massively improved our performance, and shrunk the footprint to almost nothing. And this new SAN is cheaper then what our old one was when we bought it 5 years ago, and cheaper then its replacement would have been. We bought a Cadillac when we needed a Taurus 5 years ago, we didn't make that same mistake this time.

Level 9

rschroeder  schreef:

Today we are buying storage 1.5 Petabytes at a time, a couple of times each year.  It's part of being in Health Care, having to store everyone's medical and financial records for their entire life, plus twenty years.  Ensuring access to those records one hundred years from now means continually migrating from technology to technology, hardware and software platform vendor by vendor.

I don't know of any part of this environment where it would be acceptable or even possible to say "The storage is full."

IT should definitely not reply "storage is full, sorry you can't store your data" to a change request for more capacity! Especially in health care, where you need to store and retain pretty much everything that's patient related, data growth is continuous. What I meant is: at what point do you consider one system to be full and will you add a new system?

Level 13

Good Article

Moore's Law rocks!

That was nice VAR service!

Level 20

Lololol a Taurus hehe!

We don't ever call storage "full".  Between de-duplication and compression and planned migrations and upgrades to newer/faster arrays and systems and competing brands, there's never "full" anything.  "Full" would be a disaster, and we stay at least a year ahead of any resource running out of space.

We plan out five years in advance and require all customers to provide storage space requirements for that same five years.  All new or incoming projects, systems, or technologies are also required to provide their five-year storage requirements, and aren't allowed online until the necessary storage for the coming year is already purchased, configured, and ready to support their needs.

And so far we're continuing to buy storage at about 1.5 PB every three to six months.  We are putting in IBM storage next to Windows storage--similar capacity systems, different vendors & details, due to a need to stay on top of app & vendor requirements.

The nice thing is the support contracts.  Each time one is preparing to expire, the vendor notes that we can buy newer and bigger storage solutions for less than the cost of the old gear's support contract, and we take advantage of Moore's Law every time.  We're not measuring our storage in Exabytes--yet.  But I can see that happening in a few years.  Medical storage is key.

Level 10

Last year when we went through the process of upgrading our storage, the sizing "dance" took what seemed like forever. Our biggest hurdle was the fast, good, or cheap triad. The most important stipulation placed on us the our customer was cheap, followed closely by good. We, the IT guys, wanted fast up there because if it wasn't going to be fast, the customer wouldn't consider it good. We finally came to grips with it and so far, things have gone well. Large enough for current requirements with growth, IOPS fast enough for satisfactory performance and inexpensive enough to meet the cheap requirement.

This is us as well, and it seems the footprint in the data center shrinks each generation despite capacity and performance growth.

MVP
MVP

my favorite bit of this statement is which part is reporting full?

With the SAN providing a thin provisioned and compressed LUN to the VM host, which then provides a thin provisioned disk to the guest OS, which then uses an LVM to provide a thin provisioned disk to the application.

Some one, somewhere has to know the real number

Level 14

We are pretty much running at full at present (with some squirreled away just in case).  However, we have just signed off on a £3M kit replacement which will be two HCI (Hyper Converged Infrastructure) systems (one for each on site data centre).  This will give us a lot more CPU, a lot more memory, a lot better networking and a hybrid disk / SSD data store.  Should be fun setting it all up whilst migrating 1000+ servers around to allow hardware to be replaced (and no downtime).

MVP
MVP

Don't forget fragmentation can play a part here as well.  You may have a GB or more "free", but is it contiguous enough to provide the next extent or blocks needed ?

Dedup and compression help quite a bit..but there is a case for defrag as well.

Level 20

Also SSD don't function exactly like spinning disk's do... some utilities we'd normally use with spinning disks are actually not good for SDD.

Level 9

Well written, thank you.

Level 9

Agreed! You might stop presenting new storage out to hosts because you've oversubscribed so much it's becoming a risk. And that's just capacity; what do you do if you've still got capacity to spare, but a system component (CPU) is overloaded due to compression algorithms?

Level 14

That's where Hyper Converged Infrastructure comes into its own.  You could just add more CPU without having to add any other hardware.

MVP
MVP

Lot's of good information here - both in the article and the discussion.

I am curious as to how everyone is storing all that information. Are you using standard SANs, NAS etc. or have you gone the way of the massive jbod boxes like all of the big boys use - ie. Google, Amazon, Microsoft, etc.

Level 14

Currently several large SANs but we are moving to a Hyper Converged solution on Dell VXRail so the new storage will be a part of that made up of SSD and disks with loads of data deduplication and compression.  It will be interesting to see it in action.

About the Author
Based out of the south of the Netherlands, working with everything from datacenters to operating systems. I specialize in storage, back-up & server virtualization systems and will blog about experiences in the field and industry developments. Certified EMCIEe, EMCTAe and EMCCAe on a number of Dell EMC products and constantly trying to find the time to diversify that list. When not tinkering with IT you can find me on a snowboard, motorbike or practicing a variety of sports.