Hi all,
So last week was a complete PITA, all projects get pushed to the side thanks to a total failure of a supposed safe storage redundant array. Thanks to Murphy’s Law both the safe and the redundant went out the window.
In technical terms, I had a RAID 5 array of 4 x 2TB HDDs plus one spare. All 5 drives connected to a hardware RAID controller on an older HP DL380G5 Server machine (below). About 5.6TB of space and I’d say about 4.5TB of it used… Sometime on Monday one drive failed, all good as data just rebuilds to the spare, slows down access enough for us to notice: “OK dead drive, I’ll get another”. Tuesday new drive bought and fitted replacing the failed one. Data starts to transfer from the spare to the new active drive… Then the spare fails… OK no big issue, 3 active drives still have enough data to rebuild the new drive so that starts… Then just because it could go wrong it did: A THIRD DRIVE FAILS!!!… So the array shuts down and gives me the big middle finger… Sigh.. OK reboot machine and ask it to re-activate the failed drives… hoping its just a read failure and data is still OK… It comes up, does its parity check and starts rebuilding the data to the new drive again, all good, until, well it fails again during 2nd try at rebuilding.. Reboot again. This time not so good, parity initialization takes almost 2 days to check…. No notice of further rebuilds of data… hmm maybe it means the last rebuild was successful before it failed… so I run a file system check on it… nope… it’s fubar… in all about 95% of my 100GB worth of personal digital photos corrupted, a small chunk of the e-books, half my ~800GB of Anime gone, and 2 episodes of TV shows.. Again thank you Murphy… the ~3TB of TV and Movie Videos that are replaceable are fine, the damn photos that aren’t get destroyed.. thanks… After digging through old DVD backups and portable HDDs I have I managed to recover about 90% of the lost photos.. the rest are gone forever…
Here is my set-up, not pretty but has done me good for a few years until now:
With drives numbered 1 on the left to 6 on the right:
- FAILED FIRST Drive – 2TB WD Green.
- The spare: FAILED SECOND! – 2TB Samsung Desktop HDD
- Active drive – still ok – 2TB WD Green.
- The new drive to replace #1 – 2TB Seagate Desktop HDD
- The THIRD FAILED drive! – 2TB WD Green.
- Active drive – still ok – 2TB WD Green.
Admittedly these are the cheaper Western Digital GREEN drives and the spare a basic Samsung desktop drive. Not Ideal for a NAS/Raid set-up but still brand name drives. Also the server is designed for 2.5″ SAS/SATA drives in HP Caddies.. 2TB 2.5″ drives are way out of my affordability so I used short SATA extension cables to connect up the 3.5″ drives and bolt them together to a rail for stability. As said this set-up has done me fine for a few years with no issues until now. One drive fail is expected, two fail is possible, but three drives failing? that’s just Murphy trying to upset me…
So now I’ve fired up two spare DELL servers each with 4 x 1TB drives in a RAID-0 array so I get about 3.7TB of space but no data safety. All the recovered data was put on one of these and the second is set-up to mirror all that data to its own identical array so I get some protection on the surviving data.
Since I don’t want these extra servers adding to my power bill (each reports about 200W of power use) this is only temporary.. I’ll need to rebuild the original array with new drives which will possibly take me a few months to afford as this time I’m paying the extra for proper NAS drives… Currently looking at 5 or 6 2TB Seagate NAS drives that I can get locally for about $140 each.. still a lot all up… I’m planning the new array to be a 6-drive RAID-6 array. I’ll still be in trouble with a 3 drive failure but not many other options.. I’ll be doubling safety by making extra copies of things like photos onto spare portable HDD’s and storing them in a safe place…
I now know more about RAID controller configuration and operation than I ever wanted to know…
Until next time. Thanks for reading!
— Radan.