Simulating the RAID Failure

This post is about simulating the failure of software RAID. We change the status of hardware partition as failed and simulate the failure.

Here we are using Level 1 raid as explained in by previous post Configuring software RAID (Level 1) on Linux. We are using 2 partitions here /dev/sdb1and /dev/sdb2. We will make /dev/sdb2 fail and then replace that with a new partition /dev/sdb3 of same size.

Status of current RAID can be obtained using

[root@localhost avdeo]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb2[1] sdb1[0]
987840 blocks [2/2] [UU]

unused devices: <none>

Simulation of software RAID can be done easily using following 3 steps.

1) Make the device fail.

You can make the device fail using mdadm -f command.

[root@localhost avdeo]# mdadm -f /dev/md0 /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md0

If we check /proc/mdstat we can see that device has been marked as faulty

[root@localhost avdeo]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb2[2](F) sdb1[0]
987840 blocks [2/1] [U_]

unused devices: <none>

we can also see the messages in /var/log/messages file

[root@localhost avdeo]# tail -f /var/log/messages
Sep 16 09:04:33 localhost kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep 16 09:17:05 localhost kernel: raid1: Disk failure on sdb2, disabling device.
Sep 16 09:17:05 localhost kernel:       Operation continuing on 1 devices
Sep 16 09:17:05 localhost kernel: RAID1 conf printout:
Sep 16 09:17:05 localhost kernel:  — wd:1 rd:2
Sep 16 09:17:05 localhost kernel:  disk 0, wo:0, o:1, dev:sdb1
Sep 16 09:17:05 localhost kernel:  disk 1, wo:1, o:0, dev:sdb2
Sep 16 09:17:05 localhost kernel: RAID1 conf printout:
Sep 16 09:17:05 localhost kernel:  — wd:1 rd:2
Sep 16 09:17:05 localhost kernel:  disk 0, wo:0, o:1, dev:sdb1

2) Remove the device from RAID

[root@localhost avdeo]# mdadm –remove /dev/md0 /dev/sdb2
mdadm: hot removed /dev/sdb2

[root@localhost avdeo]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[0]
987840 blocks [2/1] [U_]

unused devices: <none>

As we can see sdb2 is not seen in /proc/mdstat

Also if we check mdadm –detail command we can see that /dev/sdb2 has been removed.

[root@localhost avdeo]# mdadm –detail /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Tue Sep 16 08:59:31 2008
Raid Level : raid1
Array Size : 987840 (964.85 MiB 1011.55 MB)
Device Size : 987840 (964.85 MiB 1011.55 MB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Tue Sep 16 09:19:08 2008
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0

UUID : dcf37c14:179f9a7a:ed1f46c6:a8160267
Events : 0.6

Number   Major   Minor   RaidDevice State
0       8       17        0      active sync   /dev/sdb1
1       0        0        1      removed

3) Add a new device

[root@localhost avdeo]# mdadm –add /dev/md0 /dev/sdb3
mdadm: added /dev/sdb3

Check mdadm –detail

[root@localhost avdeo]# mdadm –detail /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Tue Sep 16 08:59:31 2008
Raid Level : raid1
Array Size : 987840 (964.85 MiB 1011.55 MB)
Device Size : 987840 (964.85 MiB 1011.55 MB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Tue Sep 16 09:19:08 2008
State : clean, degraded, recovering
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1

Rebuild Status : 34% complete

UUID : dcf37c14:179f9a7a:ed1f46c6:a8160267
Events : 0.6

Number   Major   Minor   RaidDevice State
0       8       17        0      active sync   /dev/sdb1
2       8       19        1      spare rebuilding   /dev/sdb3

Thats it !! we are done.

Hope this helps !!

Advertisement