{{tag>raid hardware}}
====== Change disk in software raid ======
First find out what partition table you have with
# gdisk -l /dev/nvme1n1
GPT fdisk (gdisk) version 1.0.1
Warning: Partition table header claims that the size of partition table
entries is 1153912944 bytes, but this program supports only 128-byte entries.
Adjusting accordingly, but partition table may be garbage.
Warning: Partition table header claims that the size of partition table
entries is 0 bytes, but this program supports only 128-byte entries.
Adjusting accordingly, but partition table may be garbage.
Partition table scan:
MBR: MBR only
BSD: not present
APM: not present
GPT: not present
***************************************************************
Found invalid GPT and valid MBR; converting MBR to GPT format
in memory.
***************************************************************
Disk /dev/nvme1n1: 1000215216 sectors, 476.9 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): A9632864-4B74-4D27-A172-6E0CF4EAD07D
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 1000215182
Partitions will be aligned on 2048-sector boundaries
Total free space is 4029 sectors (2.0 MiB)
Number Start (sector) End (sector) Size Code Name
1 2048 67110911 32.0 GiB FD00 Linux RAID
2 67110912 68159487 512.0 MiB FD00 Linux RAID
3 68159488 1000213167 444.4 GiB FD00 Linux RAID
We can see that the partition table is **MBR**. Create the backups of partition table of all drives just in case.
sfdisk --dump /dev/nvme1n1 > nvme1n1_mbr.bak
sfdisk --dump /dev/nvme0n1 > nvme0n1_mbr.bak
===== Check which disk is failed =====
~# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 nvme1n1p2[1] nvme0n1p2[0]
523712 blocks super 1.2 [2/2] [UU]
md2 : active raid1 nvme1n1p3[1](F) nvme0n1p3[0]
465895744 blocks super 1.2 [2/1] [U_]
bitmap: 4/4 pages [16KB], 65536KB chunk
md0 : active raid1 nvme1n1p1[1] nvme0n1p1[0]
33521664 blocks super 1.2 [2/2] [UU]
unused devices:
We can see that the md2 is degraded i.e. the nvme1 drive **nvme1n1p3[1](F)** and the **[U_]**.
===== Find serial number of disk =====
Now we need the serial number of disk so that we know what disk to replace. Run
smartctl --all /dev/nvme1 | grep -i serial
===== Remove the failing disk from the RAID array =====
# mdadm --manage /dev/md2 --remove /dev/nvme1n1p3
Check mdstat again to make sure the drive is removed
# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 nvme1n1p2[1] nvme0n1p2[0]
523712 blocks super 1.2 [2/2] [UU]
md2 : active raid1 nvme0n1p3[0]
465895744 blocks super 1.2 [2/1] [U_]
bitmap: 4/4 pages [16KB], 65536KB chunk
md0 : active raid1 nvme1n1p1[1] nvme0n1p1[0]
33521664 blocks super 1.2 [2/2] [UU]
unused devices:
You can check with mdadm command as well
# mdadm --detail /dev/md2
/dev/md2:
Version : 1.2
Creation Time : Thu Feb 1 15:05:55 2018
Raid Level : raid1
Array Size : 465895744 (444.31 GiB 477.08 GB)
Used Dev Size : 465895744 (444.31 GiB 477.08 GB)
Raid Devices : 2
Total Devices : 1
Persistence : Superblock is persistent
Intent Bitmap : Internal
Update Time : Mon Oct 17 13:33:12 2022
State : active, degraded
Active Devices : 1
Working Devices : 1
Failed Devices : 0
Spare Devices : 0
Name : rescue:2
UUID : 35424aca:3627ea84:f6635387:331bd056
Events : 68794798
Number Major Minor RaidDevice State
0 259 6 0 active sync /dev/nvme0n1p3
- 0 0 1 removed
You need to do this for each md array i.e. for each partition. So if there are 2 more arrays run
mdadm --manage /dev/md0 --remove /dev/nvme1n1p1
mdadm --manage /dev/md1 --remove /dev/nvme1n1p2
You might get an error
mdadm: hot remove failed for /dev/nvme1n1p1: Device or resource busy
In some cases, a drive may only be partly defective, so for example, only /dev/md2 is in the [U_] state, whereas all other devices are in the [UU] state. You need to set the drive to failed state for each md array like so
mdadm --manage /dev/md0 --fail /dev/nvme1n1p1
mdadm --manage /dev/md1 --fail /dev/nvme1n1p2
Then rerun the remove commands again.
===== Shutdown the machine and replace the disk =====
Send ticket to support or replace the disk yourself if you have access to it.
After the disk replacement you should have something like this
cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md2 : active raid1 nvme0n1p3[0]
465895744 blocks super 1.2 [2/1] [U_]
bitmap: 4/4 pages [16KB], 65536KB chunk
md0 : active (auto-read-only) raid1 nvme0n1p1[0]
33521664 blocks super 1.2 [2/1] [U_]
md1 : active raid1 nvme0n1p2[0]
523712 blocks super 1.2 [2/1] [U_]
unused devices:
~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme1n1 259:0 0 477G 0 disk
nvme0n1 259:1 0 477G 0 disk
├─nvme0n1p1 259:2 0 32G 0 part
│ └─md0 9:0 0 32G 0 raid1 [SWAP]
├─nvme0n1p2 259:3 0 512M 0 part
│ └─md1 9:1 0 511.4M 0 raid1 /boot
└─nvme0n1p3 259:4 0 444.4G 0 part
└─md2 9:2 0 444.3G 0 raid1 /
Our disk is there **nvme1n1** but its not partitioned.
===== Partition the new disk =====
First you need to replicate the partition schema on the new disk. Simply copy the partition table to a new drive using sfdisk (of course you could also do a manual partition):
sfdisk -d /dev/nvme0n1 | sfdisk /dev/nvme1n1
where /dev/nvme0n1 is the source drive and /dev/nvme1n1 is the target drive.
You should now how this output:
Checking that no-one is using this disk right now ... OK
Disk /dev/nvme1n1: 477 GiB, 512110190592 bytes, 1000215216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Created a new DOS disklabel with disk identifier 0x025ae6fe.
/dev/nvme1n1p1: Created a new partition 1 of type 'Linux raid autodetect' and of size 32 GiB.
/dev/nvme1n1p2: Created a new partition 2 of type 'Linux raid autodetect' and of size 512 MiB.
/dev/nvme1n1p3: Created a new partition 3 of type 'Linux raid autodetect' and of size 444.4 GiB.
/dev/nvme1n1p4: Done.
New situation:
Device Boot Start End Sectors Size Id Type
/dev/nvme1n1p1 2048 67110911 67108864 32G fd Linux raid autodetect
/dev/nvme1n1p2 67110912 68159487 1048576 512M fd Linux raid autodetect
/dev/nvme1n1p3 68159488 1000213167 932053680 444.4G fd Linux raid autodetect
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme1n1 259:0 0 477G 0 disk
├─nvme1n1p1 259:5 0 32G 0 part
├─nvme1n1p2 259:6 0 512M 0 part
└─nvme1n1p3 259:7 0 444.4G 0 part
nvme0n1 259:1 0 477G 0 disk
├─nvme0n1p1 259:2 0 32G 0 part
│ └─md0 9:0 0 32G 0 raid1 [SWAP]
├─nvme0n1p2 259:3 0 512M 0 part
│ └─md1 9:1 0 511.4M 0 raid1 /boot
└─nvme0n1p3 259:4 0 444.4G 0 part
└─md2 9:2 0 444.3G 0 raid1 /
===== Add new disk to the array =====
You need to do this for each partition.
mdadm /dev/md0 -a /dev/nvme1n1p1
mdadm /dev/md1 -a /dev/nvme1n1p2
mdadm /dev/md2 -a /dev/nvme1n1p3
Now you should see this:
~# cat /proc/mdstat
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10]
md2 : active raid1 nvme1n1p3[2] nvme0n1p3[0]
465895744 blocks super 1.2 [2/1] [U_]
resync=DELAYED
bitmap: 4/4 pages [16KB], 65536KB chunk
md0 : active raid1 nvme1n1p1[2] nvme0n1p1[0]
33521664 blocks super 1.2 [2/1] [U_]
[========>............] recovery = 44.1% (14805888/33521664) finish=1.5min speed=200056K/sec
md1 : active raid1 nvme1n1p2[2] nvme0n1p2[0]
523712 blocks super 1.2 [2/1] [U_]
resync=DELAYED
~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme1n1 259:0 0 477G 0 disk
├─nvme1n1p1 259:5 0 32G 0 part
│ └─md0 9:0 0 32G 0 raid1 [SWAP]
├─nvme1n1p2 259:6 0 512M 0 part
│ └─md1 9:1 0 511.4M 0 raid1 /boot
└─nvme1n1p3 259:7 0 444.4G 0 part
└─md2 9:2 0 444.3G 0 raid1 /
nvme0n1 259:1 0 477G 0 disk
├─nvme0n1p1 259:2 0 32G 0 part
│ └─md0 9:0 0 32G 0 raid1 [SWAP]
├─nvme0n1p2 259:3 0 512M 0 part
│ └─md1 9:1 0 511.4M 0 raid1 /boot
└─nvme0n1p3 259:4 0 444.4G 0 part
└─md2 9:2 0 444.3G 0 raid1 /
===== Bootloader installation =====
Wait for the resync complete, just in case.
Since the serial number of the disk changed, we need to generate a new device map with GRUB2, just reinstall grub
grub-install /dev/nvme1n1
====== Tested on ======
* Debian GNU/Linux 9.13 (stretch)
* Hetzner hosted server
* mdadm - v3.4 - 28th January 2016
====== See also ======
* [[wiki:reinstall_grub_mdraid1_array|Reinstall grub on mdraid1 array]]
* [[wiki:create_raid_5_4_disks_encryption_hetzner|Create RAID 5 from 4 disks with encryption on Hetzner]]
====== References ======
* https://docs.hetzner.com/robot/dedicated-server/raid/exchanging-hard-disks-in-a-software-raid
* https://www.redhat.com/sysadmin/raid-drive-mdadm
* https://unix.stackexchange.com/questions/120221/gpt-or-mbr-how-do-i-know
* https://www.thomas-krenn.com/en/wiki/Analyzing_a_Faulty_Hard_Disk_using_Smartctl