What's is your preferred RAID Level?

I'd say all things considered, but cost can be a huge factor in determining how much one is willing to give up in terms of storage space, with respect to gains in either performance or redundancy.

For example For a nice balance between sheer performance on the one hand, and high availability and failover on the other, I might be inclined to just choose RAID 10 as a matter of course, considering the increased read and write capabilities, but when cost is factored in, other alternatives like RAID 6 will yield much more in terms of storage with a sacrifice in performance.

When deploying HA storage schemes in server clusters, we've already invested in greater redundancy and removed the RAID controller, and to be sure, even the server, as a single point of failure, but on a single machine, where would your preferences tend to lie after you've factored in cost vs availability vs storage volume - do you strive for a happy medium? Maximum storage? Maximum performance? And which schema do you find yourself leaning towards more often than not?

Your thoughts?

  • I'm going to go back to the good-ole-days and channel some of my time as a Messaging Administrator.  In the days of Exchange 2013 (yes, those days), Microsoft recommended that you use physical machines with JBOD disk arrays thereby letting the Database Availability Groups (DAGs) be your failover mechanism.  I, personally, loved this idea for a few reasons.

    1. Easier management from the storage side - no RAID to worry about, no costly controllers, the disks could be whatever sizes
    2. No 'single point of failure' because the load was shared among different servers.
    3. Better raw disk : data ratio.  What you normally lost in building a RAID10 (or 1+0 or 0+1) you could use to just setup another server hosting copies of those databases, reducing your possible failure domains.
    4. In the event of disk failure, all 'transactions' just moved to another server and you could replace the failed disk and re-sync the changes.
    5. Power savings - Mass Storage Arrays (like those from HP, Dell, and others) are power hogs and throw of a considerable amount of heat.

    Still channeling those days still, there are some other considerations that should be considered in today's world.

    1. IOPS and Latency - if you need incredibly high IOPS, nothing beats a good RAID controller (with battery and flash-backed cache) and a bunch of identical disks
    2. Flash is here to stay - you can still get incredibly high performance using flash (or flash supplemented) arrays.  At my house, I have a NAS with a pair of flash disks to cache for a RAID 5 array and the speeds are nothing to sneeze at for a prosumer-level product.
    3. NME has been a game-changer (at least from my personal computing perspective) and I'm assuming the same from the data center storage as well.
    4. Data center-to-data center replication - people seem to downplay the idea that backups (and/or replication) are also heavy read workloads.  Be sure to scale for those as well.

    "The only constant in IT is change" has been said over and over again - and storage technologies are no exception.  This is where I used to lean heavily on our storage admins.  In our server provisioning process, we didn't fill out a form saying we needed 40GB Boot and 128 GB Data on RAID10, we said, we needed 40GB Boot and 128 GB with 7200 IOPS.  It was up to the storage admins to handle the math side of it.

    All that being said: if money is no object (we all know it is) and you are the one-person-IT shop (many of us work with and rely upon other informed, intelligent trustworthy professionals), then you can't really go wrong with RAID10.  Just don't think that throwing IOPS at badly written code will fix all of your problems, because it won't any more than throwing RAM or CPUs at it.

    Thanks for listening to a non-storage admin speak a little bit about his glory days learning from and working with good storage admins.

  • Different RAID levels for different volumes on the server, depending on the function of each. For example, you might use RAID 1 on the OS partition and RAID 5 on a data partition...and then take things from there if you decide that you want/need higher levels of protection.

  • I think I would be somewhat hesitant to incorporate RAID 5 on modern storage architectures. In the olden days it was good, but there are issues approaching and exceeding the 2TB volume level on RAID 5 that would lead me to suggest using one of the various RAID 6 levels instead when exceeding 2TB.

    Another user,  above offered a good treatment on larger volumes, and then there are the following resources too which can serve to inform one whilst making such decisions when deploying infrastructure too Slight smile

    https://wintelguy.com/raidmttdl.pl

    https://www.baarf.dk/BAARF/RAID5_versus_RAID10.txt (The BAARF site itself is somewhat amusing, with other articles too).

    Note that in the latter article, the author discerns between RAID 0+1 and RAID 1+0 - many admins don't pay much attention to the difference because functionally, they deliver the same superior performance and redundancy of each other, but when it comes to rebuilds, one definitely outshines the other - I'll leave to the reader to digest that part in the article.

    We're definitely moving away from a model where hardware RAID is the GoTo choice for data integrity, with large volume SSDs and FS's like ZFS and Btrfs, amongst others requiring direct access to the drives introducing the the high likelihood of data  corruption (software RAID is fine with these file systems), but moreover, as storage continues to be more affordable, and SSD's exceeding 100TB commonplace even in consumer based hardware, OC (Erasure Coding) has become the darling of larger infrastructure implementations.

    Here's a couple of links on object and file level Erasure Coding:

    https://www.computerweekly.com/feature/Erasure-coding-vs-RAID-Data-protection-in-the-cloud-era

    https://blog.westerndigital.com/jbod-vs-raid-vs-erasure-coding/ 

    For my two cents, when it comes to RAID, I'm a RAID 10 kinda guy - to me, the loss of available space is [nowadays] negligible when one considers the cost of drives in the marketplace, but I can remember an era where I opted for RAID 5 because I wanted every ounce of available space while not being able to justify RAID 10 (1+0) in the budget, lolz.... but then again, the term, "Fools errand", comes to mind, because I also remember doing low-level RLL formats on MFM drives to squeeze a few extra MBytes out of them too (That's your queue to start flaming me).

    Bottom line, when we consider the MTBF of hardware, we should calculate that into rebuild times for arrays and at some point abandon RAID completely since it will become the "Fools Errand", at some not so arbitrary capacity that finds those rebuild times exceeding that of the occurrence of additional drive failures in the arrays. When approaching those sizes, EC starts to really shine, IMNSHO :) The first link I posted in this post can help a lot with such advanced planning.

    I would love to hear everyone's thoughts on where they feel those cross-over points might exist for them conceptually.

  • Absolutely agree. I was intending to start with a very basic response and then add more detail, but it looks like a big chunk of that didn't copy/paste correctly and I was on my phone, so didn't notice it. I would delete that waste of a post if THWACK would let me! Haha