Is it really an issue of "if money is no object"? I mean, is a cheaper system that is fundamentally broken (or will assuredly break in the foreseeable future) somehow MORE cost effective than a more expensive solution which will actually work?
This was the argument that jbiggley, afox ciulei and the rest faced at my last company. The original argument was "well, we think we'll be fine with RAID5. And we'll convert you IF AND WHEN we see it isn't working.
But it took almost a year of poor performance, lost data, firedrills, etc; followed by about 6 months of intense data gathering to finally justify what the vendor had told us from the very start - that RAID5 was not going to suffice.
Given the loss of staff time, plus lost opportunity cost, plus various fixes we put in place as stop-gap options and inital ideas that didn't pan out, I don't think we saved anything.
I get your frustration. My laissez faire response was due to the fact that I know Tim can afford to use Raid 10. He works in the same office with me and remarks on my lack of street cred often. Obviously, the benefits of Raid 10 are evident in performance and to the user and the lack of headaches and tension in IT staff. Some agencies (mostly State) don't have the resources provided to purchase items as needed but rather as necessary. Some managers (thankfully, none of mine) are so bottom-line focused that they demand a "Make it work" attitude when it comes to planning and spending. Sometimes, even good IT people are forced to make choices they would rather not live with. It is a shameful truth in every industry and not just in the IT department. We could be teachers and then we'd really be mad (deservedly so).
You both make very valid points. You have to operate in the confines of the company and leadership you work for. You just need to do all you can to push for the recommended specs from Solarwinds and document the risks of not going with RAID10, if that's leadership's decision. I'll add that we were put (not by our choosing) on RAID5 SAN storage when Solarwinds was brought in mainly as a network monitoring tool. 2 years later, we've got many more modules and have expanded heavily into SAM. So while RAID5 wasn't the best choice at the beginning, it really caused us trouble as the Solarwinds platform grew. Now we are on local RAID10 storage and many stability headaches are gone.
Performance of various disk subsystems:
2-disk mirror ::= 2 logical read = 1 physical read per disk ; 1 logical write = 2 physical writes
4-disk raid 5 ::= 1 logical read = 4 physical reads ; 1 logical write = 4 physical writes
4-disk raid 10 ::= 2 logical read ~= 1 physical read per disk ; 1 logical write = 2 physical writes
10-disk raid 5 ::= 1 logical read = 10 physical reads ; 1 logical write = 10 physical writes
10-disk raid 10 ::= 10 logical reads ~= 1 physical reads per disk ; 1 logical write = 2 physical writes
what this means is that the read performance of a RAID10 array scales upwards with the number of spindles (as a disk block can be read from either spindle, and if they are randomly distributed across spindles many disks can be supplying data), but the write performance remains constant (each logical write has to be written to two disks no matter how large the array)
with raid5 you're reading every disk in the array, and writing every disk in the array, so as you add more disks performance degrades
I used to use a calculator somewhere to get the performance difference for the number of disks in each array.
Perhaps you should go down the real analysis route. Show the performance difference if you can.
I would like to get there for SolarWinds. But I am just as interested in getting the larger Enterprise SQL set up better. SharePoint's database and several other databases would benefit. It's all well and good to use RAID 5 for other things like backups, but SQL could benefit from different storage management.
It all comes down to the perceived value - is the higher dollar cost initially going to outweigh the ongoing costs of a slower database?
THis is an age old arguement that cannot be won when dealing with bean counters
adatole is 100% correct. Unless you have no other option then configure the DAS or LUNs you present to your DB server (whether physical or virtual) for RAID10. The performance increase is substantial. I have promised Leon that I will write up a post on our experience in the next few weeks but the summary is this:
We have nearly 11,000 nodes as well as a substantial footprint of APM monitors and VMAN integration for about 700 VMware hosts. By upgrading to a flash-based RAID10 array (and making adding a custom non-clustered index to one of our tables) we all but eliminated table locking wait time. One of the largest wait time generators now looks like wait generated because the clients can't take the data fast enough. Now that is a problem every DBA would like to have, eh!?
The capital costs saved by RAID5 rarely offset the lost productivity by staff vs a RAID 10 solution, especially in larger monitoring environments. Friends don't let their friends do RAID5 for enterprise monitoring.
I will look forward to your bigger writeup on this. I have tried time and again to convince our storage admins that we need to have some RAID 10 for our SolarWinds and Enterprise databases, but he won't budge even when I send links outlining the benefits. He just refuses to believe there could be a bottleneck there. He has RAID 5 for all the SAN space we have and for the occasional direct attached storage.
My next path is to try and convince the powers that be to let me get Storage Resource Monitor to prove the bottleneck. Hard sell in this lower budget cycle.
Maybe you could do a 30-day trial to capture the metrics needed and then do the sell AFTER the fact to guarantee prevention of future bottleneck-inducing architectural decisions!
I have thought about that. I might have to pull a few teeth to get the resources for a dev VM to put it on. Not sure I want put the trial on the production machine.
I hear ya on that - I guess it's a matter of whether you want to beg once for a dev VM, or beg for the next three years to storage admins who don't understand the write speed differences between RAID5 and RAID10.
I've experienced that kind of problem - company standard for NAS/SAN config is RAID 5 as the only one allowed :-(. We used around 6000 elements and around 30K pollers and performance is acceptable however not impressing. Probably more nodes/interfaces/applications would decrease performance significantly.
This is what I remember, but others may have more cogent responses.
RAID 5's cheaper drivewise, but that's about it.
RAID 10's faster during rebuilds, faster writes, etc.
If you can afford the drives, I'd go 10. Just my 2c, though!
SolarWinds solutions are rooted in our deep connection to our user base in the THWACK® online community. More than 150,000 members are here to solve problems, share technology and best practices, and directly contribute to our product development process.