The process of rebuilding a raid drive from parity data can cause a raid drive to fail.


Last updated January 4, 2016

A brief review of RAID 5

In a simple configuration a RAID 5 consists of three or more hard drives. The drives are logically divided into blocks of equal size. A stripe consists of blocks from all drives at the same offset from the beginning. In the example below a 3-drive RAID 5 is configured with block size of 64KB. Note that counting starts from 0 as commonly practiced in computer science.

The process of rebuilding a raid drive from parity data can cause a raid drive to fail.

Each stripe consists of three blocks. Two blocks store data and one block stores parity data. The parity block is computed by applying the exclusive-or (XOR) operator on the data blocks. In stripe #0 the value of byte #0 in the parity block (on Disk 2) is computed as follows:

Parity byte #0 = (Disk 0 byte #0) XOR (Disk 1 byte #0)

Note that from the properties of XOR:

Disk 0 byte #0 = (Parity byte #0) XOR (Disk 1 byte #0)
Disk 1 byte #0 = (Parity byte #0) XOR (Disk 0 byte #0)

If one drive fails the lost data can be computed by applying the XOR operator on the surviving drives.

RAID 5 degraded mode and rebuilding

When one drive fails a RAID 5 continues to operate in degraded mode. Data is still written as stripes but skipping the failed drive. When data needs to be read from a block on the failed drive, the RAID driver reads the remaining blocks in the stripe from the surviving drives and applies XOR to recompute the data of the missing block. In degraded mode, write performance is not affected. Read performance is slower due to the XOR process discussed. However the data is no longer protected from another drive failure.

Under this condition an operator typically performs a RAID rebuild. The failed drive is physically removed and replaced by a new drive. The RAID driver will automatically recompute the data on the failed drive and write it to the new drive. This process may take hours depending on the size of the drives. When rebuilding is complete the RAID is restored to normal working status.

Rebuilding failure

During rebuilding the RAID driver reads every block on all the surviving drives. If it encounters any bit errors, the rebuilding operation is typically aborted. The RAID is basically in limbo. It may stay in degraded mode or it may go into total failure mode. If the RAID consists of a large number of drives with large capacity, the probability of rebuilding failure can be very high. For example if the probability for a 2TB hard drive to have at least one bit error is 1%, then the probability of having at least one error in 12x2TB hard drives is 11%.

To rebuild or not to rebuild?

One may try and copy the data to new storage prior to rebuilding. However copying is subject to the same risk as rebuilding. If any error is encountered, the copying will stop. Copying, however, only requires reading from used space. The risk of failed copying for the example above is tabulated as follows:

Used space Failed rebuild probability
100% 11%

If you are in the unfortunate situation where your RAID array has failed or is running in a degraded RAID mode, most likely you’ll need to rebuild it.

A raid drive recovery may be complex and risky, but the steps below will guide you with the ways of properly rebuilding it.

The goal here is to initialize the disk without losing data.

Also, a few pointers on how each RAID configuration works are helpful for successful data recovery of RAID 0, RAID 1, RAID 5, RAID 6, and RAID 10 arrays.

Once you make out how a RAID hard drive works, you’ll find out under what circumstances you might need to rebuild a raid volume. Here’s the full guide to stop raid loss!

How to Rebuild your RAID

Rebuilding a RAID is necessary when one disk of the array stops working, even if it does so temporarily and everything seems to be functioning correctly.

When one hard drive goes offline, the workload put on others increases as they have to perform tasks instead of the defective one. Therefore, it’s only a matter of time that the still-working hard drives are going to suffer the same fate. If that happens, RAID repair and data recovery become more difficult (and almost impossible) to accomplish.

Even if immediate repair is needed, here are a few strong recommendations to follow throughout the process of RAID reconstruction without erasing data:

The process of rebuilding a raid drive from parity data can cause a raid drive to fail.

  • Before rebuilding your array, create a RAID structure image, as well as a backup on a separate volume. These actions will secure your data immediately before restructuring.
  • Do not create a new RAID on old drives! That will ruin the newly-created array and all your previous data.
  • If you can’t access and/or backup all the data stored in your RAID, do not attempt to rebuild it, contact data recovery professionals instead. However, if you can back it up, do it first and fast to stop RAID data loss. Any failed rebuild attempts lower the chances of successful data recovery.
  • Until the data is fully recovered, you must be careful in all your actions: do not create, copy, move, append, delete or save any files on the disk as that can lead to overwriting data on a damaged disk — as well as opening any bulky programs and applications.
  • Rebuilding a RAID can be quite complex; performing repairs on your own may cost you all the important files stored in the disks. In case you are not completely confident in your technical skills, it will be much wiser to delegate the rebuilding to Data Recovery specialists.
  • Do not remove more than one disk simultaneously from its initially installed position because you may lose track of the sequencing of the drives. Labeling the drives and the matching slots as you remove them should help.
  • Do not ever ignore any RAID subsystem failures warnings or any malfunctions.

6 Steps to Rebuild a Failed RAID Array

This guide should help you to rebuild a failed RAID array. Follow each step accordingly in order to avoid losing data:

Step 1. Prepare the array

Determine and secure the current state of an array; label the drives, wires, cables, ports, controller configuration, etc.

Step 2. Connect it to the Controller

Disconnect the array member disks and connect them to the controller capable of working with separate disks (it can be either a non-RAID controller or a RAID-controller in single drive mode).

Step 3.  Recover Array Parameters

Launch a RAID Recovery Software and recover the array parameters.

Step 4. Rebuild New Array

If the RAID monitoring application and controller allow you to build the array without initializing the disk, then try to do so in this mode according to the parameters determined by the RAID Recovery Software. Be very cautious in case you are attempting this because if the array is rebuilt with the wrong disk order, the data will be lost.

Step 5. Write Data to New Array

In the case of a hardware RAID, you can write the array to the disk and then try to mount the disk in whatever operating system previously used. If you don’t have a disk that is large enough, then you can build a temporary array and write data on it. Anyway, never write any data to the member disks of the original array.

Step 6. Copy Data Back

Once your information is saved and checked (always open at least several large files to be sure that data is recovered properly), rebuild the original RAID and copy data back. 

Keep in mind that RAID rebuild time depends on two things: the quantity of data being calculated and the capacity of the array itself.  A smaller RAID array can be rebuilt in just a few hours, while more complex ones can take well over 24 hours.

Recovery by RAID Configuration Type

Now that you know the steps for rebuilding a degraded raid in general terms, it’s important to consider recovery for each RAID array type (or level).

Each RAID configuration has its own set of redundancies — which means different fault tolerances. So, before you carry on with your RAID drive recovery, it will be helpful to get a little background information on those differences:

RAID 0 Recovery

Based on a striping technique, also known as “stripe set”, RAID 0 uses two or more disks in order to improve server performance. However, failure of one hard drive disk in this configuration means the whole array is affected — which is why you won’t be able to rebuild it. Contact a professional for RAID 0 recovery.

RAID 1 Recovery

This configuration applies the disk mirroring technique where two or more disks mirror one another to prevent data loss. By copying data from one disk to another, it creates a mirror image of the information and a built-in copy of everything the user has done. During the initial mirror build, there is always a bit of lag in computer function — but the speed goes back to normal as soon as the procedure is accomplished. RAID 1 is used where system reliability is critical.  

RAID 5 Recovery

Such type employs both striping and parity techniques to improve performance and reduce information loss. With disks arranged in a RAID 5 (from 3 and up to 8 units), one hard drive can temporarily go offline and no data will be lost that way — which means that in the event of a corrupted, or otherwise damaged drive, RAID 5 will continue to operate in a degraded state. RAID 5 recovery is more complicated, but the information can still be recovered even if one of the disks has failed. This array is most often used in large companies and enterprises.

RAID 6 Recovery

Very similar to RAID 5 but it utilizes two different parity functions. In spite of the fact, its recovery is very complex, RAID 6 is able to survive more than one hard drive failure. 

RAID 10 (0+1) Recovery

This array is based on mirroring and striping techniques so that it inherits RAID 1 fault tolerance and RAID 0 speed efficiency. Such configuration is able to survive a single disk failure and, in some cases, get over multiple simultaneous hard drive failures as well. RAID 0+1 is one of the most expensive types because the capacity overhead increases with the number of disks.

Need help or have questions about your failed RAID?

SALVAGEDATA’s data recovery service experts have been accomplishing RAID recoveries for individuals, Top 500 companies, and federal institutions all across the USA. Don’t hesitate to contact our RAID recovery help service line at 800-972-DATA (3282) for immediate assistance — day or night!