RAID the Hard Way

It has been a long, frustrating road to use RAID for my TimeMachine backups. Hopefully, you can learn what not to do from this... - by David R. Beebe
I optimistically purchased a Mercury Elite-AL Pro QX2 from OWC (macsales.com) back in December 2010. I had previously purchased two 1.5TB WD SATA green drives to alternately use in a NewerTech drive cradle with SuperDuper. Because I already had 2 discs, I ordered 2 more from Amazon and ordered the empty RAID cabinet from OWC. Hooked it up as the first drive in my FW800 chain.

Lesson 1: The discs in the RAID cabinet have to be the same firmware. So I had to order 2 more from Amazon. This also means that if a disc fails and that firmware is no longer available, you have to replace all 4 discs. In my case, they are all WD15EARS.



I started out in RAID 5 (setting 9) for 3 discs + 1 spare. This gave me a 4.5TB backup drive with 1 redundant disc in case of a single failure.
A few months into the backup, I got an error light/alarm on the QX2 with the disc in Slot C. Spoke with OWC Tech Support. While not listed as incompatible, I was told that because green drives spin at variable speeds, they were a poor choice for RAID 5. A write can span discs and may fail to complete consistently.

Lesson 2: Nothing said in the list of incompatible discs that this would be a problem. There was a lot of known issues with Seagate discs but these were Western Digital.



Since the variable spin rate was suspected, I switched from the preferred RAID 5 mode to Disc Spanning (setting 2) which has no redundancy or speed advantage. This of course, wiped out my TimeMachine backup.

The QX2 powered itself off and alarmed. There was no indication as to why this happened. Tech support said that there could have been a problem on the FW800 bus that caused the QX2 to panic. Nothing in the console logs. After the 2nd failure in a month, tech support suggested trying a shorter FW800 cable. I also moved other FW800 discs in the chain to USB 2.0. This left 3 discs in the FW800 chain, all self powered.

After limping along for the first year, in 2012 I start getting a blinking red error light (bad HDD on startup) on Slot C. I ran Drive Genius repair mode against the RAID, it took days. It found 16 bad blocks out of 12 billion blocks. OWC can't say if the Mac Extended (journaled) filesystem or if the disc hardware or the QX2 should manage sparing bad blocks. Soon after that, the QX2 flagged Slot C with a steady red error light (HDD error). Tech Support suggested pulling the disc and letting Disk Utility repair it. That didn't work as the drive showed as unreadable. Western Digital replaced the disc with the same firmware under warranty.

Lesson 3: Ordering the QX2 empty has a 1 year warranty instead of 3 years if purchased with discs. In the long run, this would have been less expensive since I ended up with 2 more 1.5TB discs than I needed due to the need for matching firmware.



Again the QX2 powered itself off and alarmed. Tech support thought that 3 discs chained was too much (despite FW800 standard supporting 63 chained, self-powered devices). I pulled everything but the QX2 and the LG BD-R burner from the FW chain. Had to spend days with Drive Genius again to deal with a few bad blocks on Slot C (which was recently replaced).

After this, I moved the QX2 to USB. I figured it might be the one having the problem with the FW800 bus. Problem with Slot C started up again. With Disk Utility I tried erasing the volume. It progressed to the half way point on the progress bar (up to Slot C) and they didn't make any progress other than to keep extending the expected completion time. I let it get up to 7 hours and stopped it. I re-partitioned the volume instead. USB is just too slow for this kind of volume. I stopped TimeMachine and moved it back to FW800.

To see if the problem follows the disc, I took the QX2 offline and reordered the discs. Moved C to A, D to B, A to C and B to D. In order to do this, change the RAID mode and let the QX2 rebuild before setting it back to Disc Spanning 4 discs (position 2).

Now I am faced with another loss of TimeMachine history and I am no closer to knowing what is wrong. I can start up TimeMachine again with the QX2 on USB 2.0 but that will take forever to catch up on 4+TB. The only other option is to pay an out of warranty fee for OWC to diagnose the QX2 but I am not sure if I am confident they will find a problem. I'll know more if the QX2 goes offline on USB or if the disc in Slot C errors again. In the mean time, I am moving some of my discs back to FW800.