You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
There appears to be some sort of race condition that causes RAID to not start properly. I have experienced the same issues on two separate servers. This is what I get on one of those servers:
Once the removed device is added to the RAID array everything works as expected. The affected disk appears to be random, i.e. each of the two partitions can fail to start.
I have replicated the issue in QEMU using a single disk raid array.
[root@test ~]# cat /proc/mdstat
Personalities : [raid1]
md125 : active raid1 vda2[0]
1046528 blocks super 1.2 [1/1] [U]
md126 : active raid1 vda4[0]
6288384 blocks super 1.2 [1/1] [U]
bitmap: 0/1 pages [0KB], 65536KB chunk
md127 : active raid1 vda3[0]
2094080 blocks super 1.2 [1/1] [U]
unused devices: <none>
This is the partition structure.
[root@test ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINTS
fd0 2:0 1 4K 0 disk
sr0 11:0 1 1.1G 0 rom
vda 254:0 0 10G 0 disk
├─vda1 254:1 0 1M 0 part
├─vda2 254:2 0 1G 0 part
│ └─md125 9:125 0 1022M 0 raid1 /boot
├─vda3 254:3 0 2G 0 part
│ └─md127 9:127 0 2G 0 raid1
│ └─swap_crypt 252:1 0 2G 0 crypt [SWAP]
└─vda4 254:4 0 7G 0 part
└─md126 9:126 0 6G 0 raid1
└─root_crypt 252:0 0 6G 0 crypt /var/log
From my testing the root RAID array is not started in about 2 out of every 10 times. After a minute and a half it times out and I can SSH into QEMU. I can drop the shell from the options and assemble the array using mdadm. However, I cannot decrypt the root partition since it has timed out.
Here is my systemd-tool fstab.
[root@test ~]# cat /etc/mkinitcpio-systemd-tool/config/fstab
# This file is part of https://github.com/random-archer/mkinitcpio-systemd-tool
# REQUIRED READING:
# * https://github.com/random-archer/mkinitcpio-systemd-tool/wiki/Root-vs-Fstab
# * https://github.com/random-archer/mkinitcpio-systemd-tool/wiki/System-Recovery
# fstab: mappings for direct partitions in initramfs:
# * file location in initramfs: /etc/fstab
# * file location in real-root: /etc/mkinitcpio-systemd-tool/config/fstab
# fstab format:
# https://wiki.archlinux.org/index.php/Fstab
# how fstab is used by systemd:
# https://www.freedesktop.org/software/systemd/man/systemd-fstab-generator.html
# https://github.com/systemd/systemd/blob/master/src/fstab-generator/fstab-generator.c
# note:
# * remove "root=/dev/mapper/root" stanza from kernel command line
# * provide here root partition mapping (instead of kernel command line)
# * ensure that mapper-path in fstab corresponds to mapper-name in crypttab
# * for x-mount options see: https://www.freedesktop.org/software/systemd/man/systemd.mount.html
# <block-device> <mount-point> <fs-type> <mount-options> <dump> <fsck>
# /dev/mapper/root /sysroot auto x-systemd.device-timeout=9999h 0 1
# /dev/mapper/swap none swap x-systemd.device-timeout=9999h 0 0
/dev/mapper/root_crypt /sysroot auto x-systemd.device-timeout=9999h 0 1
Here is the systemd-tool crypttab
[root@test ~]# cat /etc/mkinitcpio-systemd-tool/config/crypttab
# This file is part of https://github.com/random-archer/mkinitcpio-systemd-tool
# REQUIRED READING:
# * https://github.com/random-archer/mkinitcpio-systemd-tool/wiki/Root-vs-Fstab
# * https://github.com/random-archer/mkinitcpio-systemd-tool/wiki/System-Recovery
# crypttab: mappings for encrypted partitions in initramfs
# * file location in initramfs: /etc/crypttab
# * file location in real-root: /etc/mkinitcpio-systemd-tool/config/crypttab
# crypttab format:
# https://wiki.archlinux.org/index.php/Dm-crypt/System_configuration#crypttab
# how crypttab is used by systemd:
# https://www.freedesktop.org/software/systemd/man/systemd-cryptsetup-generator.html
# https://github.com/systemd/systemd/blob/master/src/cryptsetup/cryptsetup-generator.c
# note:
# * provide here mapper partition UUID (instead of kernel command line)
# * use password/keyfile=none to force cryptsetup password agent prompt
# * ensure that mapper-path in fstab corresponds to mapper-name in crypttab
# * for x-mount options see: https://www.freedesktop.org/software/systemd/man/systemd.mount.html
# <mapper-name> <block-device> <password/keyfile> <crypto-options>
# root UUID={{UUID_ROOT}} none luks
# swap UUID={{UUID_SWAP}} none luks
root_crypt /dev/md/root none luks
My hooks are HOOKS=(base systemd autodetect keyboard sd-vconsole modconf block mdadm_udev sd-encrypt filesystems fsck systemd-tool).
This issue is present only with systemd-tool. I have tested with both unencrypted raid and encrypted raid in QEMU.
On the servers one can boot after the failed RAID device is removed since there is a redundant device, i.e. it's a two device RAID array unlike the QEMU images. This is what I have found from testing on the two servers:
the issue affects different physical disks, i.e. I checked the PTUUIDs;
the issue affects different partitions, i.e. I have separate RAID devices for different partitions;
the issue appears to affect only a single RAID device at a time (one of the servers has 6 disks and 18 RAID 1 arrays);
the missing device is listed as removed by mdadm --detail and is not identified by a parth to /dev, i.e. it just says removed without any further information;
once re-added RAID rebuilds the device;
if the affected partition is large and has a bitmap, it takes around 5 seconds;
if the affected partition is small and does not have a bit map, it takes slightly longer to mirror;
mdadm logs simply state active with 1 out of 2 mirrors; and
the affected partition times out on reboot after 30 seconds.
Here are the dmesg logs from one reboot.
[ 1.546487] md/raid1:md127: active with 2 out of 2 mirrors
[ 1.554870] md127: detected capacity change from 0 to 3835518976
[ 1.617467] md/raid1:md126: active with 2 out of 2 mirrors
[ 1.617478] md126: detected capacity change from 0 to 2093056
[ 1.739809] device-mapper: uevent: version 1.0.3
[ 1.739864] device-mapper: ioctl: 4.48.0-ioctl (2023-03-01) initialised: dm-devel@lists.linux.dev
[ 1.796344] raid6: skipped pq benchmark and selected avx2x4
[ 1.796347] raid6: using avx2x2 recovery algorithm
[ 1.799177] xor: automatically using best checksumming function avx
[ 1.903424] Btrfs loaded, zoned=yes, fsverity=yes
[ 32.218708] md/raid1:md125: active with 1 out of 2 mirrors
[ 32.218733] md125: detected capacity change from 0 to 67041280
Here is another reboot that affected a different RAID device.
[ 1.576421] md/raid1:md126: active with 2 out of 2 mirrors
[ 1.576439] md126: detected capacity change from 0 to 67041280
[ 1.644082] md/raid1:md125: active with 2 out of 2 mirrors
[ 1.644101] md125: detected capacity change from 0 to 2093056
[ 1.793562] raid6: skipped pq benchmark and selected avx2x4
[ 1.793565] raid6: using avx2x2 recovery algorithm
[ 1.796313] xor: automatically using best checksumming function avx
[ 1.900468] Btrfs loaded, zoned=yes, fsverity=yes
[ 31.559597] md/raid1:md127: active with 1 out of 2 mirrors
[ 31.600410] md127: detected capacity change from 0 to 3835518976
The text was updated successfully, but these errors were encountered:
usually in such cases there is a need to declare explicit inter-unit dependency for root_crypt_mdarray.mount, and by root_crypt_mdarray.mount I mean: whatever systemd-device.unit is manifested by systemd core after md array becomes "really ready"
There appears to be some sort of race condition that causes RAID to not start properly. I have experienced the same issues on two separate servers. This is what I get on one of those servers:
Once the removed device is added to the RAID array everything works as expected. The affected disk appears to be random, i.e. each of the two partitions can fail to start.
I have replicated the issue in QEMU using a single disk raid array.
This is the partition structure.
From my testing the root RAID array is not started in about 2 out of every 10 times. After a minute and a half it times out and I can SSH into QEMU. I can drop the shell from the options and assemble the array using mdadm. However, I cannot decrypt the root partition since it has timed out.
Here is my systemd-tool fstab.
Here is the systemd-tool crypttab
My hooks are
HOOKS=(base systemd autodetect keyboard sd-vconsole modconf block mdadm_udev sd-encrypt filesystems fsck systemd-tool)
.This issue is present only with systemd-tool. I have tested with both unencrypted raid and encrypted raid in QEMU.
On the servers one can boot after the failed RAID device is removed since there is a redundant device, i.e. it's a two device RAID array unlike the QEMU images. This is what I have found from testing on the two servers:
mdadm --detail
and is not identified by a parth to/dev
, i.e. it just says removed without any further information;Here are the dmesg logs from one reboot.
Here is another reboot that affected a different RAID device.
The text was updated successfully, but these errors were encountered: