How to Rebuilding failed Linux software RAID

Ref: http://aplawrence.com/Linux/rebuildraid.html

Recently I had a hard drive fail. It was part of a Linux software RAID 1 (mirrored drives), so we lost no data, and just needed to replace hardware. However, the raid does requires rebuilding. A hardware array would usually automatically rebuild upon drive replacement, but this needed some help.

When you look at a “normal” array, you see something like this:

# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 hda3[1] hdb3[0]
262016 blocks [2/2] [UU]

md1 : active raid1 hda2[1] hdb2[0]
119684160 blocks [2/2] [UU]

md0 : active raid1 hda1[1] hdb1[0]
102208 blocks [2/2] [UU]

unused devices:

That’s the normal state – what you want it to look like. When a drive has failed and been replaced, it looks like this:

Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1 hda1[1]
102208 blocks [2/1] [_U]

md2 : active raid1 hda3[1]
262016 blocks [2/1] [_U]

md1 : active raid1 hda2[1]
119684160 blocks [2/1] [_U]
unused devices:

Notice that it doesn’t list the failed drive parts, and that an underscore appears beside each U. This shows that only one drive is active in these arrays – we have no mirror.

Another command that will show us the state of the raid drives is “mdadm”

# mdadm -D /dev/md0
/dev/md0:
Version : 00.90.00
Creation Time : Thu Aug 21 12:22:43 2003
Raid Level : raid1
Array Size : 102208 (99.81 MiB 104.66 MB)
Device Size : 102208 (99.81 MiB 104.66 MB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Fri Oct 15 06:25:45 2004
State : dirty, no-errors
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

Number Major Minor RaidDevice State
0 0 0 0 faulty removed
1 3 1 1 active sync /dev/hda1
UUID : f9401842:995dc86c:b4102b57:f2996278

As this shows, we presently only have one drive in the array.

Although I already knew that /dev/hdb was the other part of the raid array, you can look at /etc/raidtab to see how the raid was defined:

raiddev /dev/md1
raid-level 1
nr-raid-disks 2
chunk-size 64k
persistent-superblock 1
nr-spare-disks 0
device /dev/hda2
raid-disk 0
device /dev/hdb2
raid-disk 1
raiddev /dev/md0
raid-level 1
nr-raid-disks 2
chunk-size 64k
persistent-superblock 1
nr-spare-disks 0
device /dev/hda1
raid-disk 0
device /dev/hdb1
raid-disk 1
raiddev /dev/md2
raid-level 1
nr-raid-disks 2
chunk-size 64k
persistent-superblock 1
nr-spare-disks 0
device /dev/hda3
raid-disk 0
device /dev/hdb3
raid-disk 1

To get the mirrored drives working properly again, we need to run fdisk to see what partitions are on the working drive:

# fdisk /dev/hda

Command (m for help): p

Disk /dev/hda: 255 heads, 63 sectors, 14946 cylinders
Units = cylinders of 16065 * 512 bytes

Device Boot Start End Blocks Id System
/dev/hda1 * 1 13 104391 fd Linux raid autodetect
/dev/hda2 14 14913 119684250 fd Linux raid autodetect
/dev/hda3 14914 14946 265072+ fd Linux raid autodetect

Duplicate that on /dev/hdb. Use “n” to create the parttions, and “t” to change their type to “fd” to match. Once this is done, use “raidhotadd”:

# raidhotadd /dev/md0 /dev/hdb1
# raidhotadd /dev/md1 /dev/hdb2
# raidhotadd /dev/md2 /dev/hdb3

The rebuilding can be seen in /proc/mdstat:

# cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1 hdb1[0] hda1[1]
102208 blocks [2/2] [UU]

md2 : active raid1 hda3[1]
262016 blocks [2/1] [_U]

md1 : active raid1 hdb2[2] hda2[1]
119684160 blocks [2/1] [_U]
[>………………..] recovery = 0.2% (250108/119684160) finish=198.8min speed=10004K/sec
unused devices:

The md0, a small array, has already completed rebuilding (UU), while md1 has only begun. After it finishes, it will show:

# mdadm -D /dev/md1
/dev/md1:
Version : 00.90.00
Creation Time : Thu Aug 21 12:21:21 2003
Raid Level : raid1
Array Size : 119684160 (114.13 GiB 122.55 GB)
Device Size : 119684160 (114.13 GiB 122.55 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent

Update Time : Fri Oct 15 13:19:11 2004
State : dirty, no-errors
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Number Major Minor RaidDevice State
0 3 66 0 active sync /dev/hdb2
1 3 2 1 active sync /dev/hda2
UUID : ede70f08:0fdf752d:b408d85a:ada8922b

I was a little surprised that this process wasn’t entirely automatic. There’s no reason it couldn’t be. This is an older Linux install; I don’t know if more modern versions will just automatically rebuild.

Leave a Reply

*