domingo, março 05, 2017

add disk as hot spare in a mdadm RAID

You can check the current state of the array with cat /proc/mdstat. In this example, that's where the data comes from.
So let's assume we have md127 with 3 disks in a raid1. Here they're just partitions of one disk, but it doesn't matter
md127 : active raid1 vdb3[2] vdb2[1] vdb1[0]
      102272 blocks super 1.2 [3/3] [UUU]
We need to offline one of the disks before we can remove it:
$ sudo mdadm --manage /dev/md127 --fail /dev/vdb2
mdadm: set /dev/vdb2 faulty in /dev/md127
And the status now shows it's bad
md127 : active raid1 vdb3[2] vdb2[1](F) vdb1[0]
      102272 blocks super 1.2 [3/2] [U_U]
We can now remove this disk:
$ sudo mdadm --manage /dev/md127 --remove /dev/vdb2
mdadm: hot removed /dev/vdb2 from /dev/md127

md127 : active raid1 vdb3[2] vdb1[0]
      102272 blocks super 1.2 [3/2] [U_U]
And now resize:
$ sudo mdadm --grow /dev/md127 --raid-devices=2
raid_disks for /dev/md127 set to 2
unfreeze
At this point we have successfully reduced the array down to 2 disks:
md127 : active raid1 vdb3[2] vdb1[0]
      102272 blocks super 1.2 [2/2] [UU]
So now the new disk can be re-added as a hotspare:
$ sudo mdadm -a /dev/md127 /dev/vdb2
mdadm: added /dev/vdb2

md127 : active raid1 vdb2[3](S) vdb3[2] vdb1[0]
      102272 blocks super 1.2 [2/2] [UU]
The (S) shows it's a hotspare.
We can verify this works as expected by failing an existing disk and noticing a rebuild takes place on the spare:
$ sudo mdadm --manage /dev/md127 --fail /dev/vdb1
mdadm: set /dev/vdb1 faulty in /dev/md127

md127 : active raid1 vdb2[3] vdb3[2] vdb1[0](F)
      102272 blocks super 1.2 [2/1] [_U]
      [=======>.............]  recovery = 37.5% (38400/102272) finish=0.0min speed=38400K/sec
vdb2 is no longer marked (S) because it's not a hotspare.
After the bad disk has been re-added it is now marked as the hotspare
md127 : active raid1 vdb1[4](S) vdb2[3] vdb3[2]
      102272 blocks super 1.2 [2/2] [UU]