{{tag>raid hardware}} ====== Change disk in software raid ====== First find out what partition table you have with # gdisk -l /dev/nvme1n1 GPT fdisk (gdisk) version 1.0.1 Warning: Partition table header claims that the size of partition table entries is 1153912944 bytes, but this program supports only 128-byte entries. Adjusting accordingly, but partition table may be garbage. Warning: Partition table header claims that the size of partition table entries is 0 bytes, but this program supports only 128-byte entries. Adjusting accordingly, but partition table may be garbage. Partition table scan: MBR: MBR only BSD: not present APM: not present GPT: not present *************************************************************** Found invalid GPT and valid MBR; converting MBR to GPT format in memory. *************************************************************** Disk /dev/nvme1n1: 1000215216 sectors, 476.9 GiB Logical sector size: 512 bytes Disk identifier (GUID): A9632864-4B74-4D27-A172-6E0CF4EAD07D Partition table holds up to 128 entries First usable sector is 34, last usable sector is 1000215182 Partitions will be aligned on 2048-sector boundaries Total free space is 4029 sectors (2.0 MiB) Number Start (sector) End (sector) Size Code Name 1 2048 67110911 32.0 GiB FD00 Linux RAID 2 67110912 68159487 512.0 MiB FD00 Linux RAID 3 68159488 1000213167 444.4 GiB FD00 Linux RAID We can see that the partition table is **MBR**. Create the backups of partition table of all drives just in case. sfdisk --dump /dev/nvme1n1 > nvme1n1_mbr.bak sfdisk --dump /dev/nvme0n1 > nvme0n1_mbr.bak ===== Check which disk is failed ===== ~# cat /proc/mdstat Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md1 : active raid1 nvme1n1p2[1] nvme0n1p2[0] 523712 blocks super 1.2 [2/2] [UU] md2 : active raid1 nvme1n1p3[1](F) nvme0n1p3[0] 465895744 blocks super 1.2 [2/1] [U_] bitmap: 4/4 pages [16KB], 65536KB chunk md0 : active raid1 nvme1n1p1[1] nvme0n1p1[0] 33521664 blocks super 1.2 [2/2] [UU] unused devices: We can see that the md2 is degraded i.e. the nvme1 drive **nvme1n1p3[1](F)** and the **[U_]**. ===== Find serial number of disk ===== Now we need the serial number of disk so that we know what disk to replace. Run smartctl --all /dev/nvme1 | grep -i serial ===== Remove the failing disk from the RAID array ===== # mdadm --manage /dev/md2 --remove /dev/nvme1n1p3 Check mdstat again to make sure the drive is removed # cat /proc/mdstat Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md1 : active raid1 nvme1n1p2[1] nvme0n1p2[0] 523712 blocks super 1.2 [2/2] [UU] md2 : active raid1 nvme0n1p3[0] 465895744 blocks super 1.2 [2/1] [U_] bitmap: 4/4 pages [16KB], 65536KB chunk md0 : active raid1 nvme1n1p1[1] nvme0n1p1[0] 33521664 blocks super 1.2 [2/2] [UU] unused devices: You can check with mdadm command as well # mdadm --detail /dev/md2 /dev/md2: Version : 1.2 Creation Time : Thu Feb 1 15:05:55 2018 Raid Level : raid1 Array Size : 465895744 (444.31 GiB 477.08 GB) Used Dev Size : 465895744 (444.31 GiB 477.08 GB) Raid Devices : 2 Total Devices : 1 Persistence : Superblock is persistent Intent Bitmap : Internal Update Time : Mon Oct 17 13:33:12 2022 State : active, degraded Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Name : rescue:2 UUID : 35424aca:3627ea84:f6635387:331bd056 Events : 68794798 Number Major Minor RaidDevice State 0 259 6 0 active sync /dev/nvme0n1p3 - 0 0 1 removed You need to do this for each md array i.e. for each partition. So if there are 2 more arrays run mdadm --manage /dev/md0 --remove /dev/nvme1n1p1 mdadm --manage /dev/md1 --remove /dev/nvme1n1p2 You might get an error mdadm: hot remove failed for /dev/nvme1n1p1: Device or resource busy In some cases, a drive may only be partly defective, so for example, only /dev/md2 is in the [U_] state, whereas all other devices are in the [UU] state. You need to set the drive to failed state for each md array like so mdadm --manage /dev/md0 --fail /dev/nvme1n1p1 mdadm --manage /dev/md1 --fail /dev/nvme1n1p2 Then rerun the remove commands again. ===== Shutdown the machine and replace the disk ===== Send ticket to support or replace the disk yourself if you have access to it. After the disk replacement you should have something like this cat /proc/mdstat Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md2 : active raid1 nvme0n1p3[0] 465895744 blocks super 1.2 [2/1] [U_] bitmap: 4/4 pages [16KB], 65536KB chunk md0 : active (auto-read-only) raid1 nvme0n1p1[0] 33521664 blocks super 1.2 [2/1] [U_] md1 : active raid1 nvme0n1p2[0] 523712 blocks super 1.2 [2/1] [U_] unused devices: ~# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme1n1 259:0 0 477G 0 disk nvme0n1 259:1 0 477G 0 disk ├─nvme0n1p1 259:2 0 32G 0 part │ └─md0 9:0 0 32G 0 raid1 [SWAP] ├─nvme0n1p2 259:3 0 512M 0 part │ └─md1 9:1 0 511.4M 0 raid1 /boot └─nvme0n1p3 259:4 0 444.4G 0 part └─md2 9:2 0 444.3G 0 raid1 / Our disk is there **nvme1n1** but its not partitioned. ===== Partition the new disk ===== First you need to replicate the partition schema on the new disk. Simply copy the partition table to a new drive using sfdisk (of course you could also do a manual partition): sfdisk -d /dev/nvme0n1 | sfdisk /dev/nvme1n1 where /dev/nvme0n1 is the source drive and /dev/nvme1n1 is the target drive. You should now how this output: Checking that no-one is using this disk right now ... OK Disk /dev/nvme1n1: 477 GiB, 512110190592 bytes, 1000215216 sectors Units: sectors of 1 * 512 = 512 bytes Sector size (logical/physical): 512 bytes / 512 bytes I/O size (minimum/optimal): 512 bytes / 512 bytes >>> Script header accepted. >>> Script header accepted. >>> Script header accepted. >>> Script header accepted. >>> Created a new DOS disklabel with disk identifier 0x025ae6fe. /dev/nvme1n1p1: Created a new partition 1 of type 'Linux raid autodetect' and of size 32 GiB. /dev/nvme1n1p2: Created a new partition 2 of type 'Linux raid autodetect' and of size 512 MiB. /dev/nvme1n1p3: Created a new partition 3 of type 'Linux raid autodetect' and of size 444.4 GiB. /dev/nvme1n1p4: Done. New situation: Device Boot Start End Sectors Size Id Type /dev/nvme1n1p1 2048 67110911 67108864 32G fd Linux raid autodetect /dev/nvme1n1p2 67110912 68159487 1048576 512M fd Linux raid autodetect /dev/nvme1n1p3 68159488 1000213167 932053680 444.4G fd Linux raid autodetect The partition table has been altered. Calling ioctl() to re-read partition table. Syncing disks. ~# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme1n1 259:0 0 477G 0 disk ├─nvme1n1p1 259:5 0 32G 0 part ├─nvme1n1p2 259:6 0 512M 0 part └─nvme1n1p3 259:7 0 444.4G 0 part nvme0n1 259:1 0 477G 0 disk ├─nvme0n1p1 259:2 0 32G 0 part │ └─md0 9:0 0 32G 0 raid1 [SWAP] ├─nvme0n1p2 259:3 0 512M 0 part │ └─md1 9:1 0 511.4M 0 raid1 /boot └─nvme0n1p3 259:4 0 444.4G 0 part └─md2 9:2 0 444.3G 0 raid1 / ===== Add new disk to the array ===== You need to do this for each partition. mdadm /dev/md0 -a /dev/nvme1n1p1 mdadm /dev/md1 -a /dev/nvme1n1p2 mdadm /dev/md2 -a /dev/nvme1n1p3 Now you should see this: ~# cat /proc/mdstat Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] md2 : active raid1 nvme1n1p3[2] nvme0n1p3[0] 465895744 blocks super 1.2 [2/1] [U_] resync=DELAYED bitmap: 4/4 pages [16KB], 65536KB chunk md0 : active raid1 nvme1n1p1[2] nvme0n1p1[0] 33521664 blocks super 1.2 [2/1] [U_] [========>............] recovery = 44.1% (14805888/33521664) finish=1.5min speed=200056K/sec md1 : active raid1 nvme1n1p2[2] nvme0n1p2[0] 523712 blocks super 1.2 [2/1] [U_] resync=DELAYED ~# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT nvme1n1 259:0 0 477G 0 disk ├─nvme1n1p1 259:5 0 32G 0 part │ └─md0 9:0 0 32G 0 raid1 [SWAP] ├─nvme1n1p2 259:6 0 512M 0 part │ └─md1 9:1 0 511.4M 0 raid1 /boot └─nvme1n1p3 259:7 0 444.4G 0 part └─md2 9:2 0 444.3G 0 raid1 / nvme0n1 259:1 0 477G 0 disk ├─nvme0n1p1 259:2 0 32G 0 part │ └─md0 9:0 0 32G 0 raid1 [SWAP] ├─nvme0n1p2 259:3 0 512M 0 part │ └─md1 9:1 0 511.4M 0 raid1 /boot └─nvme0n1p3 259:4 0 444.4G 0 part └─md2 9:2 0 444.3G 0 raid1 / ===== Bootloader installation ===== Wait for the resync complete, just in case. Since the serial number of the disk changed, we need to generate a new device map with GRUB2, just reinstall grub grub-install /dev/nvme1n1 ====== Tested on ====== * Debian GNU/Linux 9.13 (stretch) * Hetzner hosted server * mdadm - v3.4 - 28th January 2016 ====== See also ====== * [[wiki:reinstall_grub_mdraid1_array|Reinstall grub on mdraid1 array]] * [[wiki:create_raid_5_4_disks_encryption_hetzner|Create RAID 5 from 4 disks with encryption on Hetzner]] ====== References ====== * https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid * https://www.redhat.com/sysadmin/raid-drive-mdadm * https://unix.stackexchange.com/questions/120221/gpt-or-mbr-how-do-i-know * https://www.thomas-krenn.com/en/wiki/Analyzing_a_Faulty_Hard_Disk_using_Smartctl