This post is about simulating the failure of software RAID. We change the status of hardware partition as failed and simulate the failure.
Here we are using Level 1 raid as explained in by previous post Configuring software RAID (Level 1) on Linux. We are using 2 partitions here /dev/sdb1and /dev/sdb2. We will make /dev/sdb2 fail and then replace that with a new partition /dev/sdb3 of same size.
Status of current RAID can be obtained using
[root@localhost avdeo]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb2[1] sdb1[0]
987840 blocks [2/2] [UU]
unused devices: <none>
Simulation of software RAID can be done easily using following 3 steps.
1) Make the device fail.
You can make the device fail using mdadm -f command.
[root@localhost avdeo]# mdadm -f /dev/md0 /dev/sdb2
mdadm: set /dev/sdb2 faulty in /dev/md0
If we check /proc/mdstat we can see that device has been marked as faulty
[root@localhost avdeo]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb2[2](F) sdb1[0]
987840 blocks [2/1] [U_]
unused devices: <none>
we can also see the messages in /var/log/messages file
[root@localhost avdeo]# tail -f /var/log/messages
Sep 16 09:04:33 localhost kernel: EXT3-fs: mounted filesystem with ordered data mode.
Sep 16 09:17:05 localhost kernel: raid1: Disk failure on sdb2, disabling device.
Sep 16 09:17:05 localhost kernel: Operation continuing on 1 devices
Sep 16 09:17:05 localhost kernel: RAID1 conf printout:
Sep 16 09:17:05 localhost kernel: — wd:1 rd:2
Sep 16 09:17:05 localhost kernel: disk 0, wo:0, o:1, dev:sdb1
Sep 16 09:17:05 localhost kernel: disk 1, wo:1, o:0, dev:sdb2
Sep 16 09:17:05 localhost kernel: RAID1 conf printout:
Sep 16 09:17:05 localhost kernel: — wd:1 rd:2
Sep 16 09:17:05 localhost kernel: disk 0, wo:0, o:1, dev:sdb1
2) Remove the device from RAID
[root@localhost avdeo]# mdadm –remove /dev/md0 /dev/sdb2
mdadm: hot removed /dev/sdb2
[root@localhost avdeo]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdb1[0]
987840 blocks [2/1] [U_]
unused devices: <none>
As we can see sdb2 is not seen in /proc/mdstat
Also if we check mdadm –detail command we can see that /dev/sdb2 has been removed.
[root@localhost avdeo]# mdadm –detail /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Tue Sep 16 08:59:31 2008
Raid Level : raid1
Array Size : 987840 (964.85 MiB 1011.55 MB)
Device Size : 987840 (964.85 MiB 1011.55 MB)
Raid Devices : 2
Total Devices : 1
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Tue Sep 16 09:19:08 2008
State : clean, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
UUID : dcf37c14:179f9a7a:ed1f46c6:a8160267
Events : 0.6
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
1 0 0 1 removed
3) Add a new device
[root@localhost avdeo]# mdadm –add /dev/md0 /dev/sdb3
mdadm: added /dev/sdb3
Check mdadm –detail
[root@localhost avdeo]# mdadm –detail /dev/md0
/dev/md0:
Version : 00.90.03
Creation Time : Tue Sep 16 08:59:31 2008
Raid Level : raid1
Array Size : 987840 (964.85 MiB 1011.55 MB)
Device Size : 987840 (964.85 MiB 1011.55 MB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Tue Sep 16 09:19:08 2008
State : clean, degraded, recovering
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
Rebuild Status : 34% complete
UUID : dcf37c14:179f9a7a:ed1f46c6:a8160267
Events : 0.6
Number Major Minor RaidDevice State
0 8 17 0 active sync /dev/sdb1
2 8 19 1 spare rebuilding /dev/sdb3
Thats it !! we are done.
Hope this helps !!