https://raid.wiki.kernel.org/index.php/Growing
linux mdadm replace disk raid 5
linux mdadm replace disk raid 6
centos linux mdadm replace disk raid 5
centos linux mdadm replace disk raid 6
short version
But, here are the steps to replace a disk...
First fail the disk
Code:
#sudo mdadm --manage /dev/md0 --fail /dev/sdb1
Then, remove it from the array
Code:
#sudo mdadm --manage /dev/md0 --remove /dev/sdb1
Then, replace it with a new one...
Code:
#sudo mdadm --manage /dev/md0 --add /dev/sdb1
I
f the disk is not available anymore, you can just fdisk the new drive for linux raid and then add it to the array with the last command I gave above, and mdadm will add it in, and start to resync the array. Hope that helps.
command list:
Faulty disks recovery
- umount /dev/md0
- mdadm --stop /dev/md0
- mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
- if command above does not show all your drivers than you have a problem, should the force...
- mdadm —assemble --force /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
- remove faulty disk if necessary, and add new ones
- mdadm --manage /dev/md0 --add /dev/sdc1
- mdadm --manage /dev/md0 --add /dev/sdd1
- Example error when driver in array "Error: mdadm: Cannot open /dev/sdb1: Device or resource busy" (already in array)
- cat /proc/mdstat (watch -n1 cat /proc/mdstat)
Output
$ mdadm --stop /dev/md0
mdadm: stopped /dev/md0
$ mdadm --assemble /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm: /dev/md0 has been started with 4 drives.
—
cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid6 sda1[0] sdd1[5] sdc1[4] sdb1[1]
3906764800 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]
bitmap: 0/15 pages [0KB], 65536KB chunk
unused devices:
Take my RAID for example:
root@mark21:/tmp/etc/udev# fdisk -l /dev/sda
Disk /dev/sda: 640.1 GB, 640135028736 bytes
255 heads, 63 sectors/track, 77825 cylinders, total 1250263728 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x0000ffc4
Device Boot Start End Blocks Id System
/dev/sda1 2048 1240233983 620115968 fd Linux raid autodetect
root@mark21:/tmp/etc/udev# dumpe2fs /dev/sda1
dumpe2fs 1.41.14 (22-Dec-2010)
dumpe2fs: Bad magic number in super-block while trying to open /dev/sda
Couldn't find valid filesystem superblock.
That you were able to recreate the RAID set at all is extremely lucky, but that doesn't change the fundamental flaws in your deployment. This will happen again.
What I would recommend is:
- Backup everything on that raid set
- Destroy the array and erase the md superblock from each device (man mdadm)
- Zero out those disks:
dd if=/dev/zero of=/dev/sdX bs=1M count=100
- Create partitions on sda, sdc, sdd, & sdf that span 99% of the disk [0]
- Tag those partitions as
type fd
linux-raid wiki - never ever format these partitions with any sort of filesystem
- Create a new RAID 5: mdadm --create /dev/md0 -v -f -l 5 -n 4 /dev/sda1 /dev/sdc1 /dev/sdd1 /dev/sde1
- Update new UUID in /etc/mdadm.conf
- Live happily ever after
lsblk
mdadm --examine /dev/sdb /dev/sdc /dev/sdd /dev/sda
cat /proc/mdstat
mdadm --stop /dev/md0
mdadm --assemble --scan
mdadm --examine /dev/sdb /dev/sdc /dev/sdd /dev/sda
mdadm --examine /dev/sda1
mdadm --examine /dev/sdb1
mdadm --examine /dev/sdc1
mdadm --examine /dev/sdb1
mdadm --assemble --scan -v
mdadm --examine /dev/sda1 |more
mdadm --examine /dev/sdc1 |more
mdadm --examine /dev/sda1 |more
mdadm --examine /dev/sdc1 |more
mdadm --examine /dev/sda1 |more
mdadm --examine /dev/sdb1 |more
mdadm --examine /dev/sdd1 |more
mdadm --assemble --scan -v
mdadm --assemble --force /dev/md0 /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1
mdadm --detail /dev/md0
cat /proc/mdstat
mdadm --detail /dev/md0
mdadm --examine --scan
mdadm --detail /dev/md0
vim /etc/mdadm.conf
mdadm -E --scan
dmesg |grep md
cat /proc/mdstat
mdadm -E /dev/sd[a-d]1
mdadm -E /dev/sd[a]1
mdadm -E /dev/sd[b]1
mdadm -E /dev/sdc1
mdadm -E /dev/sdd1
mdadm -E /dev/sda1
mdadm --detail /dev/md0
mdadm --manage /dev/md0 --add /dev/sdc1
mdadm --manage /dev/md0 --add /dev/sdd1
mdadm --detail /dev/md0
Long:
Replacing a failed disk in a mdadm RAID
Introduction
I have a RAID5 with 4 disks, see Rebuilding and updating my Linux NAS and HTPC server, and from my daily digest emails of the system I discovered that one of my disk had issues. I found the following in dmesg:
Investigating the bad drive
To further investigate the disk in question (/dev/sde) I looked into the S.M.A.R.T (Self-Monitoring, Analysis and Reporting Technology) status of the sick drive:
This didn’t really tell me anything, so I started a “long” self-test with the following command. The long self-test takes about 2 hours – alternatively there is a short, but less thorough self-test that takes around 2 minutes:
The output of a self-test can be found with the following command. In my case it was clear the the drive indeed was in trouble.
I ordered a 3TB WD RED disk (especially made for NAS operations) to replace it. It is much larger and initially I will not be able to utilize the 3TB, but once all the old 1TB disks eventually fails and I have replaced them all with 3TB disks, I can grow the raid.
Removing the faulty disk
A important part of a RAID setup is the ability to cope with the failure of a faulty disk. The enclosure I have does not support hot-swap and the disk have no separate lights for each disk, so I need a way to find out which of the disks to replace. Finding the serial number of the disk is fairly easy:
and luckily the Western Digital disks I have came with a small sticker which shows the serial on the disk. So now I know the serial number of the faulty disk, so before shutting down and replacing the disk I marked as failed in madam and removed from the raid:
Adding the new drive
Having replaced the faulty disk and inserted the new disk I found the serial on the back and compared it to the serial of /dev/sde to make sure I was about to format the right disk:
Partitioning disk over 2TB does not work with MSDOS filetable so I needed to use parted (instead of fdisk to partition the disk correctly). The “-a optimal” makes parted use the optimum alignment as given by the disk topology information. This aligns to a multiple of the physical block size in a way that guarantees optimal performance.
Now the disk was ready for inclusion in the raid:
Over the next 3 hours I could monitor the rebuild using the following command:
Monitoring health of the raid
I have several systems in place to monitor the health of my raid (among other things):
- logwatch – monitors my /var/log/messages for anything out of the ordinary and mails me the output on a daily basis.
- mdadm – mdadm will mail me if a disk has completely failed or the raid for some other reason fails. A complete resync is done every week.
- smartd – I have smartd running “short” tests every night and long tests every second week. Reports are mailed to me.
- munin – graphical and historical monitoring of performance and all stats of the server.
Nenhum comentário:
Postar um comentário