Panic when trying to remove files from _one_ of my datasets #16939

arzeth · 2025-01-09T14:09:36Z

System information

Type	Version/Name
Distribution Name	Arch Linux
Kernel Version	linux-xanmod-edge 6.12.5-1 PREEMPT_RT
Architecture	x86_64
OpenZFS Version	2.2.7
CPU	Ryzen 5 2600
RAM	DDR4 3200MHz, 32GB*2, no ECC
MB	MSI B450M PRO-M2 MAX

grub options: nospectre_v2 split_lock_mitigate=0 zfs.spa_slop_shift=6 zfs.zfs_dmu_offset_next_sync=0 zfs.zfs_txg_timeout=20 zfs.zio_taskq_batch_pct=42 zfs.zfs_arc_max=1610612736 nvme.max_host_mem_size_mb=256 scsi_mod.use_blk_mq=1 ^{(I read somewhere zfs_dmu_offset_next_sync=0 supposedly decreases the chance of corruption; txg_timeout is because the constant noise of my Seagate Skyhawk HDD was unbearable; zio_taskq_batch_pct is to avoid my PC becoming unresponsive during heavy writing to datasets with compression=zstd-19; I limited ARC size to 1.5GiB to save RAM)}

I've been using this config (I mean an RT kernel) since 2024-12-15, i.e. for 23 days. It worked until today, except that waking up didn't succeed in most cases (complete lock-up). Probably nvidia's driver at fault (since they don't support RT officially)?... no idea. Lock-ups occurred also in random times but after many hours of uptime. A few times, immediately after some of successful wake-ups, I saw ZFS-related traces mentioning something txg, unlock; but I ignored them......

My pools:

NVMe SSD: one pool named zroot;
HDD: three pools.

all have ashift 12, and properly aligned partitions.

So, 2 days ago I was seeding torrents (using files on the HDD), and then I decided to load up an LLM (8 GB .gguf file on an unencrypted partition on the SSD; llama.cpp), and during the loading, my PC suddenly became fully unresponsive (even no SSH and alt+print+b). At that moment what was happening:

many random reads on an HDD (many encrypted datasets with compression) by qBittorrent
one sequential read of a big file (with mmaping) on an SSD (pool zroot, unencrypted dataset zroot/data/m/c/rseq without compression) by llama.cpp
small writes on an SSD (pool zroot, encrypted dataset zroot/data/home) by qBittorrent (metadata), Firefox.
all while also using a non-stock RT kernel.

I found the timing of the lock-up suspicious, so today I decided to repeat the same scenario. This time the LLM did get fully loaded (in ~28 secs), but... 5 seconds later my PC became unresponsive again (it's like there is some connection to zfs.zfs_txg_timeout=20).

I rebooted my PC, selected the stock LTS kernel, then I saw:

Arch Linux 6.6.68-1-lts (tty1)

arzeth-old pc login: arzeth (automatic login)
Last login Wed Jan  8 10:37:02 on tty1
[   65.596542] VERIFY3(0 == zap_add_int(zfsvfs->z_os, zfsvfs->z_unlinkedobj, zp->z_id, tx)) failed (0 == 5)
[   65.596573] PANIC at zfs_dir.c:464:zfs_unlinked_add()

Afterwards, I booted from a LiveCD (even earlier kernel: Linux sysrescue 6.1.53-1-lts #1 SMP PREEMPT_DYNAMIC Wed, 13 Sep 2023 09:32:00 +0000 x86_64 GNU/Linux), installed ZFS 2.2.7 (same latest version), mounted all pools, ran zpool scrub zroot, then zpool status zroot:

  pool: zroot
 state: ONLINE
  scan: scrub repaired 0B in 00:12:38 with 0 errors on Wed Jan  8 15:19:16 2025
config:

	NAME         STATE     READ WRITE CKSUM
	zroot        ONLINE       0     0     0
	  nvme0n1p4  ONLINE       0     0     0

errors: No known data errors

[root@sysrescue /tmp]#  smartctl -a /dev/nvme0
smartctl 7.4 2023-08-01 r5530 [x86_64-linux-6.1.53-1-lts] (local build)
Copyright (C) 2002-23, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       m.2 Smartbuy PS5013-2280T 1024GB
Serial Number:                      296E079B18FC00010017
Firmware Version:                   EDFM00E3
PCI Vendor/Subsystem ID:            0x1987
IEEE OUI Identifier:                0x6479a7
Total NVM Capacity:                 1,024,209,543,168 [1.02 TB]
Unallocated NVM Capacity:           0
Controller ID:                      1
NVMe Version:                       1.3
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,024,209,543,168 [1.02 TB]
Namespace 1 Formatted LBA Size:     4096
Namespace 1 IEEE EUI-64:            6479a7 2ae2673137
Local Time is:                      Wed Jan  8 20:16:53 2025 UTC
Firmware Updates (0x12):            1 Slot, no Reset required
Optional Admin Commands (0x001f):   Security Format Frmw_DL NS_Mngmt Self_Test
Optional NVM Commands (0x005e):     Wr_Unc DS_Mngmt Wr_Zero Sav/Sel_Feat Timestmp
Log Page Attributes (0x0e):         Cmd_Eff_Lg Ext_Get_Lg Telmtry_Lg
Maximum Data Transfer Size:         64 Pages
Warning  Comp. Temp. Threshold:     68 Celsius
Critical Comp. Temp. Threshold:     70 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     4.50W       -        -    0  0  0  0        0       0
 1 +     2.70W       -        -    1  1  1  1        0       0
 2 +     2.16W       -        -    2  2  2  2        0       0
 3 -   0.0700W       -        -    3  3  3  3     1000    1000
 4 -   0.0020W       -        -    4  4  4  4     5000   60000

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 -     512       0         1
 1 +    4096       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        25 Celsius
Available Spare:                    100%
Available Spare Threshold:          5%
Percentage Used:                    11%
Data Units Read:                    351,607,959 [180 TB]
Data Units Written:                 157,926,176 [80.8 TB]
Host Read Commands:                 3,239,887,842
Host Write Commands:                5,191,310,103
Controller Busy Time:               92,845
Power Cycles:                       1,910
Power On Hours:                     27,620
Unsafe Shutdowns:                   201
Media and Data Integrity Errors:    0
Error Information Log Entries:      351,361
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0

Error Information (NVMe Log 0x01, 16 of 16 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS  Message
  0     351361     0  0x0014  0x4005  0x028            0     0     -  Invalid Field in Command

Self-test Log (NVMe Log 0x06)
Self-test status: No self-test in progress
No Self-tests Logged

The bad dataset:

[root@sysrescue /tmp/x]#  zfs get all zroot/data/home
NAME             PROPERTY              VALUE                  SOURCE
zroot/data/home  type                  filesystem             -
zroot/data/home  creation              Sun Nov 28 16:27 2021  -
zroot/data/home  used                  76.0G                  -
zroot/data/home  available             16.9G                  -
zroot/data/home  referenced            75.0G                  -
zroot/data/home  compressratio         1.47x                  -
zroot/data/home  mounted               yes                    -
zroot/data/home  quota                 none                   default
zroot/data/home  reservation           none                   default
zroot/data/home  recordsize            256K                   local
zroot/data/home  mountpoint            /home                  local
zroot/data/home  sharenfs              off                    default
zroot/data/home  checksum              on                     default
zroot/data/home  compression           zstd-17                local
zroot/data/home  atime                 off                    inherited from zroot/data
zroot/data/home  devices               off                    inherited from zroot
zroot/data/home  exec                  on                     default
zroot/data/home  setuid                on                     default
zroot/data/home  readonly              off                    default
zroot/data/home  zoned                 off                    default
zroot/data/home  snapdir               hidden                 default
zroot/data/home  aclmode               discard                default
zroot/data/home  aclinherit            restricted             default
zroot/data/home  createtxg             13                     -
zroot/data/home  canmount              on                     default
zroot/data/home  xattr                 sa                     inherited from zroot
zroot/data/home  copies                1                      default
zroot/data/home  version               5                      -
zroot/data/home  utf8only              on                     -
zroot/data/home  normalization         formD                  -
zroot/data/home  casesensitivity       sensitive              -
zroot/data/home  vscan                 off                    default
zroot/data/home  nbmand                off                    default
zroot/data/home  sharesmb              off                    default
zroot/data/home  refquota              none                   default
zroot/data/home  refreservation        none                   default
zroot/data/home  guid                  6560759384699375364    -
zroot/data/home  primarycache          all                    default
zroot/data/home  secondarycache        all                    default
zroot/data/home  usedbysnapshots       0B                     -
zroot/data/home  usedbydataset         75.0G                  -
zroot/data/home  usedbychildren        933M                   -
zroot/data/home  usedbyrefreservation  0B                     -
zroot/data/home  logbias               latency                default
zroot/data/home  objsetid              77                     -
zroot/data/home  dedup                 off                    local
zroot/data/home  mlslabel              none                   default
zroot/data/home  sync                  standard               default
zroot/data/home  dnodesize             legacy                 inherited from zroot
zroot/data/home  refcompressratio      1.47x                  -
zroot/data/home  written               75.0G                  -
zroot/data/home  logicalused           108G                   -
zroot/data/home  logicalreferenced     107G                   -
zroot/data/home  volmode               default                default
zroot/data/home  filesystem_limit      none                   default
zroot/data/home  snapshot_limit        none                   default
zroot/data/home  filesystem_count      none                   default
zroot/data/home  snapshot_count        none                   default
zroot/data/home  snapdev               hidden                 default
zroot/data/home  acltype               posix                  inherited from zroot
zroot/data/home  context               none                   default
zroot/data/home  fscontext             none                   default
zroot/data/home  defcontext            none                   default
zroot/data/home  rootcontext           none                   default
zroot/data/home  relatime              on                     inherited from zroot
zroot/data/home  redundant_metadata    all                    default
zroot/data/home  overlay               on                     default
zroot/data/home  encryption            aes-256-gcm            -
zroot/data/home  keylocation           none                   default
zroot/data/home  keyformat             passphrase             -
zroot/data/home  pbkdf2iters           350000                 -
zroot/data/home  encryptionroot        zroot                  -
zroot/data/home  keystatus             available              -
zroot/data/home  special_small_blocks  0                      default
zroot/data/home  prefetch              all                    default

Its pool:

[root@sysrescue /tmp/x]#  zpool get all zroot
NAME   PROPERTY                       VALUE                          SOURCE
zroot  size                           887G                           -
zroot  capacity                       94%                            -
zroot  altroot                        -                              default
zroot  health                         ONLINE                         -
zroot  guid                           5900436512089678044            -
zroot  version                        -                              default
zroot  bootfs                         zroot/ROOT/default             local
zroot  delegation                     on                             default
zroot  autoreplace                    off                            default
zroot  cachefile                      -                              default
zroot  failmode                       wait                           default
zroot  listsnapshots                  off                            default
zroot  autoexpand                     off                            default
zroot  dedupratio                     1.00x                          -
zroot  free                           44.5G                          -
zroot  allocated                      842G                           -
zroot  readonly                       off                            -
zroot  ashift                         12                             local
zroot  comment                        -                              default
zroot  expandsize                     -                              -
zroot  freeing                        0                              -
zroot  fragmentation                  58%                            -
zroot  leaked                         0                              -
zroot  multihost                      off                            default
zroot  checkpoint                     -                              -
zroot  load_guid                      18281612753199259114           -
zroot  autotrim                       off                            default
zroot  compatibility                  off                            default
zroot  bcloneused                     2.24M                          -
zroot  bclonesaved                    2.24M                          -
zroot  bcloneratio                    2.00x                          -
zroot  feature@async_destroy          enabled                        local
zroot  feature@empty_bpobj            active                         local
zroot  feature@lz4_compress           active                         local
zroot  feature@multi_vdev_crash_dump  enabled                        local
zroot  feature@spacemap_histogram     active                         local
zroot  feature@enabled_txg            active                         local
zroot  feature@hole_birth             active                         local
zroot  feature@extensible_dataset     active                         local
zroot  feature@embedded_data          active                         local
zroot  feature@bookmarks              enabled                        local
zroot  feature@filesystem_limits      enabled                        local
zroot  feature@large_blocks           active                         local
zroot  feature@large_dnode            enabled                        local
zroot  feature@sha512                 enabled                        local
zroot  feature@skein                  enabled                        local
zroot  feature@edonr                  enabled                        local
zroot  feature@userobj_accounting     active                         local
zroot  feature@encryption             active                         local
zroot  feature@project_quota          active                         local
zroot  feature@device_removal         enabled                        local
zroot  feature@obsolete_counts        enabled                        local
zroot  feature@zpool_checkpoint       enabled                        local
zroot  feature@spacemap_v2            active                         local
zroot  feature@allocation_classes     enabled                        local
zroot  feature@resilver_defer         enabled                        local
zroot  feature@bookmark_v2            enabled                        local
zroot  feature@redaction_bookmarks    enabled                        local
zroot  feature@redacted_datasets      enabled                        local
zroot  feature@bookmark_written       enabled                        local
zroot  feature@log_spacemap           active                         local
zroot  feature@livelist               enabled                        local
zroot  feature@device_rebuild         enabled                        local
zroot  feature@zstd_compress          active                         local
zroot  feature@draid                  enabled                        local
zroot  feature@zilsaxattr             active                         local
zroot  feature@head_errlog            active                         local
zroot  feature@blake3                 enabled                        local
zroot  feature@block_cloning          active                         local
zroot  feature@vdev_zaps_v2           active                         local

Then I backed up this dataset into another dataset... on the same pool... (tar -I 'zstd -15 --long -T12' /m/el/r/home.tar.zst /home) successfully.

Then I tried to use zsh under my user, but zsh tried to overwrite a file using mv, the result is bad:
ps aux: 1000 40360 0.0 0.0 26600 764 pts/7 D+ 15:34 0:00 mv -f /home/arzeth/.zcompdump-sysrescue-5.9.0.1-dev.sysrescue.59 /home/arzeth/.zcompdump-sysrescue-5.9.0.1-dev

[ 7207.536498] VERIFY3(0 == zap_add_int(zfsvfs->z_os, zfsvfs->z_unlinkedobj, zp->z_id, tx)) failed (0 == 5)
[ 7207.536507] PANIC at zfs_dir.c:464:zfs_unlinked_add()
[ 7207.536511] Showing stack for process 40360
[ 7207.536514] CPU: 9 PID: 40360 Comm: mv Tainted: P        W  OE      6.1.53-1-lts #1 3321f1751995b4e489a9f363253659a863626916
[ 7207.536520] Hardware name: Micro-Star International Co., Ltd. MS-7B84/B450M PRO-M2 MAX (MS-7B84), BIOS A.I0 04/27/2023
[ 7207.536523] Call Trace:
[ 7207.536526]  <TASK>
[ 7207.536530]  dump_stack_lvl+0x48/0x60
[ 7207.536540]  spl_panic+0xf4/0x10c [spl 45e036db99f8bb7928be41aea061122d23d0d2f4]
[ 7207.536570]  ? zap_add_int+0x86/0xb0 [zfs e6c99f07607bde3a9c94eccf32a35fc156462904]
[ 7207.536825]  zfs_unlinked_add+0x67/0x70 [zfs e6c99f07607bde3a9c94eccf32a35fc156462904]
[ 7207.537068]  zfs_link_destroy+0x3bc/0x440 [zfs e6c99f07607bde3a9c94eccf32a35fc156462904]
[ 7207.537308]  ? dmu_buf_will_dirty_impl+0x154/0x210 [zfs e6c99f07607bde3a9c94eccf32a35fc156462904]
[ 7207.537548]  zfs_rename+0x10e8/0x1730 [zfs e6c99f07607bde3a9c94eccf32a35fc156462904]
[ 7207.537797]  ? __legitimize_path+0x27/0x60
[ 7207.537806]  zpl_rename2+0xa7/0x130 [zfs e6c99f07607bde3a9c94eccf32a35fc156462904]
[ 7207.538043]  vfs_rename+0xa69/0xc10
[ 7207.538052]  ? do_renameat2+0x52d/0x5a0
[ 7207.538057]  do_renameat2+0x52d/0x5a0
[ 7207.538067]  __x64_sys_renameat2+0x4f/0x60
[ 7207.538073]  do_syscall_64+0x60/0x90
[ 7207.538078]  ? exc_page_fault+0x7c/0x180
[ 7207.538084]  entry_SYSCALL_64_after_hwframe+0x64/0xce
[ 7207.538090] RIP: 0033:0x7f091902ac8e
[ 7207.538115] Code: 80 18 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 f3 0f 1e fa 49 89 ca b8 3c 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 0a c3 66 0f 1f 84 00 00 00 00 00 48 8b 15 49
[ 7207.538118] RSP: 002b:00007ffdd186cb98 EFLAGS: 00000206 ORIG_RAX: 000000000000013c
[ 7207.538124] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f091902ac8e
[ 7207.538127] RDX: 00000000ffffff9c RSI: 00007ffdd186dc64 RDI: 00000000ffffff9c
[ 7207.538130] RBP: 00007ffdd186cd10 R08: 0000000000000000 R09: 0000000000000000
[ 7207.538132] R10: 00007ffdd186dc9f R11: 0000000000000206 R12: 00007ffdd186dc9f
[ 7207.538134] R13: 00007ffdd186dc64 R14: 00000000ffffff9c R15: 00000000ffffff9c
[ 7207.538141]  </TASK>

This panic causes any process (even ls) trying to use this dataset to become a zombie process.

Then I decided to reboot (still using the livecd), because I wanted to know whether a panic would occur when deleting an old file.

So I tried rm a 3-year-old file (12 bytes) on this dataset. The error in dmesg is similar:

[ 2637.661779] VERIFY3(0 == zap_add_int(zfsvfs->z_os, zfsvfs->z_unlinkedobj, zp->z_id, tx)) failed (0 == 5)
[ 2637.661786] PANIC at zfs_dir.c:464:zfs_unlinked_add()
[ 2637.661790] Showing stack for process 4398
[ 2637.661793] CPU: 0 PID: 4398 Comm: rm Tainted: P        W  OE      6.1.53-1-lts #1 3321f1751995b4e489a9f363253659a863626916
[ 2637.661799] Hardware name: Micro-Star International Co., Ltd. MS-7B84/B450M PRO-M2 MAX (MS-7B84), BIOS A.I0 04/27/2023
[ 2637.661802] Call Trace:
[ 2637.661806]  <TASK>
[ 2637.661811]  dump_stack_lvl+0x48/0x60
[ 2637.661822]  spl_panic+0xf4/0x10c [spl 45e036db99f8bb7928be41aea061122d23d0d2f4]
[ 2637.661851]  ? zap_add_int+0x86/0xb0 [zfs e6c99f07607bde3a9c94eccf32a35fc156462904]
[ 2637.662105]  zfs_unlinked_add+0x67/0x70 [zfs e6c99f07607bde3a9c94eccf32a35fc156462904]
[ 2637.662346]  zfs_remove+0x7f6/0xa20 [zfs e6c99f07607bde3a9c94eccf32a35fc156462904]
[ 2637.662587]  zpl_unlink+0x64/0xb0 [zfs e6c99f07607bde3a9c94eccf32a35fc156462904]
[ 2637.662823]  vfs_unlink+0x112/0x280
[ 2637.662830]  do_unlinkat+0x148/0x320
[ 2637.662838]  __x64_sys_unlinkat+0x37/0x70
[ 2637.662843]  do_syscall_64+0x60/0x90
[ 2637.662849]  ? handle_mm_fault+0xdf/0x2d0
[ 2637.662854]  ? syscall_exit_to_user_mode+0x2b/0x40
[ 2637.662860]  ? do_syscall_64+0x6c/0x90
[ 2637.662865]  ? exc_page_fault+0x7c/0x180
[ 2637.662871]  entry_SYSCALL_64_after_hwframe+0x64/0xce
[ 2637.662877] RIP: 0033:0x7fa9f0e5d52b
[ 2637.662898] Code: 77 05 c3 0f 1f 40 00 48 8b 15 01 98 13 00 f7 d8 64 89 02 b8 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 07 01 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 97 13 00 f7 d8 64 89 01 48
[ 2637.662901] RSP: 002b:00007fff259ee6d8 EFLAGS: 00000246 ORIG_RAX: 0000000000000107
[ 2637.662906] RAX: ffffffffffffffda RBX: 000055e40c2486c0 RCX: 00007fa9f0e5d52b
[ 2637.662909] RDX: 0000000000000000 RSI: 000055e40c2474a0 RDI: 00000000ffffff9c
[ 2637.662912] RBP: 000055e40c247410 R08: 000055e40c2474a0 R09: 00007fff259ee7ec
[ 2637.662914] R10: 000055e40c249630 R11: 0000000000000246 R12: 0000000000000000
[ 2637.662916] R13: 00007fff259ee7f0 R14: 0000000000000000 R15: 000055e40c2486c0
[ 2637.662922]  </TASK>

BTW, even after a panic occurs, I can still use other datasets in this pool (without rebooting).

I had 2 datasets inside this dataset, so I made another experiment: despite ZFS already having had a panic, I tried to move it out of ..../home/ (I am not talking about mountpoint): zfs rename zroot/data/home{/,_}arzeth_dev, and got a very similar message in dmesg. After the panic, I didn't reboot, and I ran zfs list which showed that the dataset renaming succeeded (even though zfs rename .... became a zombie process).

Then I tried to rename another inner (child) dataset (without rebooting), but this time I saw no changes in zfs list, so I've rebooted, ran zfs list again just in case (the first renaming did indeed succeed). I haven't tried yet to zfs rename this remaining inner dataset again.

Panic does not occur when creating folders and moving files into them (in this corrupted dataset).

So what should I do?

Try zfs destroy zroot/data/home, but what if this irreversibly corrupts the pool due to a panic in the middle of the process?
Recreate the pool (back up everything, zpool destroy zroot, zpool create, zfs create, cp, cp, cp)
Something else?

Maybe also ban PREEMPT_RT in configure.ac, just like recent 410287f banned unsupported kernel versions?

The text was updated successfully, but these errors were encountered:

amotin · 2025-01-09T16:15:10Z

I would start from memory check, considering it is non-ECC, to not cause more damage if it is the cause. Then make sure you have a backup, which is on your second option already. Then I would guess you should have a good chance to destroy only the specific dataset, since the ZAP it crashed on is dataset-specific, but it is difficult to be sure without knowing what's wrong about it exactly.

arzeth added the Type: Defect Incorrect behavior (e.g. crash, hang) label Jan 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Panic when trying to remove files from _one_ of my datasets #16939

Panic when trying to remove files from _one_ of my datasets #16939

arzeth commented Jan 9, 2025

amotin commented Jan 9, 2025

Panic when trying to remove files from _one_ of my datasets #16939

Panic when trying to remove files from _one_ of my datasets #16939

Comments

arzeth commented Jan 9, 2025

System information

amotin commented Jan 9, 2025