Change disk in software raid

First find out what partition table you have with

# gdisk -l /dev/nvme1n1
GPT fdisk (gdisk) version 1.0.1

Warning: Partition table header claims that the size of partition table
entries is 1153912944 bytes, but this program  supports only 128-byte entries.
Adjusting accordingly, but partition table may be garbage.
Warning: Partition table header claims that the size of partition table
entries is 0 bytes, but this program  supports only 128-byte entries.
Adjusting accordingly, but partition table may be garbage.
Partition table scan:
  MBR: MBR only
  BSD: not present
  APM: not present
  GPT: not present


***************************************************************
Found invalid GPT and valid MBR; converting MBR to GPT format
in memory. 
***************************************************************

Disk /dev/nvme1n1: 1000215216 sectors, 476.9 GiB
Logical sector size: 512 bytes
Disk identifier (GUID): A9632864-4B74-4D27-A172-6E0CF4EAD07D
Partition table holds up to 128 entries
First usable sector is 34, last usable sector is 1000215182
Partitions will be aligned on 2048-sector boundaries
Total free space is 4029 sectors (2.0 MiB)

Number  Start (sector)    End (sector)  Size       Code  Name
   1            2048        67110911   32.0 GiB    FD00  Linux RAID
   2        67110912        68159487   512.0 MiB   FD00  Linux RAID
   3        68159488      1000213167   444.4 GiB   FD00  Linux RAID

We can see that the partition table is MBR. Create the backups of partition table of all drives just in case.

sfdisk --dump /dev/nvme1n1 > nvme1n1_mbr.bak
sfdisk --dump /dev/nvme0n1 > nvme0n1_mbr.bak

Check which disk is failed

~# cat /proc/mdstat 
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid1 nvme1n1p2[1] nvme0n1p2[0]
      523712 blocks super 1.2 [2/2] [UU]
      
md2 : active raid1 nvme1n1p3[1](F) nvme0n1p3[0]
      465895744 blocks super 1.2 [2/1] [U_]
      bitmap: 4/4 pages [16KB], 65536KB chunk

md0 : active raid1 nvme1n1p1[1] nvme0n1p1[0]
      33521664 blocks super 1.2 [2/2] [UU]
      
unused devices: <none>

We can see that the md2 is degraded i.e. the nvme1 drive nvme1n1p3[1](F) and the [U_].

Find serial number of disk

Now we need the serial number of disk so that we know what disk to replace. Run

smartctl --all /dev/nvme1 | grep -i serial

Remove the failing disk from the RAID array

# mdadm --manage /dev/md2 --remove /dev/nvme1n1p3

Check mdstat again to make sure the drive is removed

# cat /proc/mdstat 
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md1 : active raid1 nvme1n1p2[1] nvme0n1p2[0]
      523712 blocks super 1.2 [2/2] [UU]
      
md2 : active raid1 nvme0n1p3[0]
      465895744 blocks super 1.2 [2/1] [U_]
      bitmap: 4/4 pages [16KB], 65536KB chunk

md0 : active raid1 nvme1n1p1[1] nvme0n1p1[0]
      33521664 blocks super 1.2 [2/2] [UU]
      
unused devices: <none>

You can check with mdadm command as well

# mdadm --detail /dev/md2
/dev/md2:
        Version : 1.2
  Creation Time : Thu Feb  1 15:05:55 2018
     Raid Level : raid1
     Array Size : 465895744 (444.31 GiB 477.08 GB)
  Used Dev Size : 465895744 (444.31 GiB 477.08 GB)
   Raid Devices : 2
  Total Devices : 1
    Persistence : Superblock is persistent

  Intent Bitmap : Internal

    Update Time : Mon Oct 17 13:33:12 2022
          State : active, degraded 
 Active Devices : 1
Working Devices : 1
 Failed Devices : 0
  Spare Devices : 0

           Name : rescue:2
           UUID : 35424aca:3627ea84:f6635387:331bd056
         Events : 68794798

    Number   Major   Minor   RaidDevice State
       0     259        6        0      active sync   /dev/nvme0n1p3
       -       0        0        1      removed

You need to do this for each md array i.e. for each partition. So if there are 2 more arrays run

mdadm --manage /dev/md0 --remove /dev/nvme1n1p1
mdadm --manage /dev/md1 --remove /dev/nvme1n1p2

You might get an error

mdadm: hot remove failed for /dev/nvme1n1p1: Device or resource busy

In some cases, a drive may only be partly defective, so for example, only /dev/md2 is in the [U_] state, whereas all other devices are in the [UU] state. You need to set the drive to failed state for each md array like so

mdadm --manage /dev/md0 --fail /dev/nvme1n1p1
mdadm --manage /dev/md1 --fail /dev/nvme1n1p2

Then rerun the remove commands again.

Shutdown the machine and replace the disk

Send ticket to support or replace the disk yourself if you have access to it.

After the disk replacement you should have something like this

cat /proc/mdstat 
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md2 : active raid1 nvme0n1p3[0]
      465895744 blocks super 1.2 [2/1] [U_]
      bitmap: 4/4 pages [16KB], 65536KB chunk

md0 : active (auto-read-only) raid1 nvme0n1p1[0]
      33521664 blocks super 1.2 [2/1] [U_]
      
md1 : active raid1 nvme0n1p2[0]
      523712 blocks super 1.2 [2/1] [U_]
      
unused devices: <none>


~# lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
nvme1n1     259:0    0   477G  0 disk  
nvme0n1     259:1    0   477G  0 disk  
├─nvme0n1p1 259:2    0    32G  0 part  
│ └─md0       9:0    0    32G  0 raid1 [SWAP]
├─nvme0n1p2 259:3    0   512M  0 part  
│ └─md1       9:1    0 511.4M  0 raid1 /boot
└─nvme0n1p3 259:4    0 444.4G  0 part  
  └─md2       9:2    0 444.3G  0 raid1 /

Our disk is there nvme1n1 but its not partitioned.

Partition the new disk

First you need to replicate the partition schema on the new disk. Simply copy the partition table to a new drive using sfdisk (of course you could also do a manual partition):

sfdisk -d /dev/nvme0n1 | sfdisk /dev/nvme1n1

where /dev/nvme0n1 is the source drive and /dev/nvme1n1 is the target drive.

You should now how this output:

Checking that no-one is using this disk right now ... OK

Disk /dev/nvme1n1: 477 GiB, 512110190592 bytes, 1000215216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Script header accepted.
>>> Created a new DOS disklabel with disk identifier 0x025ae6fe.
/dev/nvme1n1p1: Created a new partition 1 of type 'Linux raid autodetect' and of size 32 GiB.
/dev/nvme1n1p2: Created a new partition 2 of type 'Linux raid autodetect' and of size 512 MiB.
/dev/nvme1n1p3: Created a new partition 3 of type 'Linux raid autodetect' and of size 444.4 GiB.
/dev/nvme1n1p4: Done.

New situation:

Device         Boot    Start        End   Sectors   Size Id Type
/dev/nvme1n1p1          2048   67110911  67108864    32G fd Linux raid autodetect
/dev/nvme1n1p2      67110912   68159487   1048576   512M fd Linux raid autodetect
/dev/nvme1n1p3      68159488 1000213167 932053680 444.4G fd Linux raid autodetect

The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.

~# lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
nvme1n1     259:0    0   477G  0 disk  
├─nvme1n1p1 259:5    0    32G  0 part  
├─nvme1n1p2 259:6    0   512M  0 part  
└─nvme1n1p3 259:7    0 444.4G  0 part  
nvme0n1     259:1    0   477G  0 disk  
├─nvme0n1p1 259:2    0    32G  0 part  
│ └─md0       9:0    0    32G  0 raid1 [SWAP]
├─nvme0n1p2 259:3    0   512M  0 part  
│ └─md1       9:1    0 511.4M  0 raid1 /boot
└─nvme0n1p3 259:4    0 444.4G  0 part  
  └─md2       9:2    0 444.3G  0 raid1 /

Add new disk to the array

You need to do this for each partition.

mdadm /dev/md0 -a /dev/nvme1n1p1
mdadm /dev/md1 -a /dev/nvme1n1p2
mdadm /dev/md2 -a /dev/nvme1n1p3

Now you should see this:

~# cat /proc/mdstat 
Personalities : [raid1] [linear] [multipath] [raid0] [raid6] [raid5] [raid4] [raid10] 
md2 : active raid1 nvme1n1p3[2] nvme0n1p3[0]
      465895744 blocks super 1.2 [2/1] [U_]
      	resync=DELAYED
      bitmap: 4/4 pages [16KB], 65536KB chunk

md0 : active raid1 nvme1n1p1[2] nvme0n1p1[0]
      33521664 blocks super 1.2 [2/1] [U_]
      [========>............]  recovery = 44.1% (14805888/33521664) finish=1.5min speed=200056K/sec
      
md1 : active raid1 nvme1n1p2[2] nvme0n1p2[0]
      523712 blocks super 1.2 [2/1] [U_]
      	resync=DELAYED


~# lsblk 
NAME        MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
nvme1n1     259:0    0   477G  0 disk  
├─nvme1n1p1 259:5    0    32G  0 part  
│ └─md0       9:0    0    32G  0 raid1 [SWAP]
├─nvme1n1p2 259:6    0   512M  0 part  
│ └─md1       9:1    0 511.4M  0 raid1 /boot
└─nvme1n1p3 259:7    0 444.4G  0 part  
  └─md2       9:2    0 444.3G  0 raid1 /
nvme0n1     259:1    0   477G  0 disk  
├─nvme0n1p1 259:2    0    32G  0 part  
│ └─md0       9:0    0    32G  0 raid1 [SWAP]
├─nvme0n1p2 259:3    0   512M  0 part  
│ └─md1       9:1    0 511.4M  0 raid1 /boot
└─nvme0n1p3 259:4    0 444.4G  0 part  
  └─md2       9:2    0 444.3G  0 raid1 /

Bootloader installation

Wait for the resync complete, just in case.

Since the serial number of the disk changed, we need to generate a new device map with GRUB2, just reinstall grub

grub-install /dev/nvme1n1

Tested on

Debian GNU/Linux 9.13 (stretch)
Hetzner hosted server
mdadm - v3.4 - 28th January 2016

antisaWiki

Table of Contents