Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc kernel panics when using fio with brd-backed pool #16707

Open
tonyhutter opened this issue Oct 30, 2024 · 0 comments
Open

Misc kernel panics when using fio with brd-backed pool #16707

tonyhutter opened this issue Oct 30, 2024 · 0 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@tonyhutter
Copy link
Contributor

System information

Type Version/Name
Distribution Name RHEL
Distribution Version 8.10
Kernel Version 4.18
Architecture x86-64
OpenZFS Version master (6187b19)

Describe the problem you're observing

Misc kernel panics when running fio against a brd-backed pool (see reproducer). 100% reproducible. Note that we're not using O_DIRECT here.

Describe how to reproduce the problem

Reproducer is below. It takes just a few seconds of running to panic. Caution - this creates 100GB ramdisk as-is, so please adjust values to your system.

#!/bin/bash

devs=1
size=$((100000000 / $devs))
recordsize=4k
jobs=32

modprobe brd rd_nr=$devs rd_size=$size
for i in `seq 0 $(($devs - 1))` ; do
	alldevs="$alldevs /dev/ram$i"
done
sudo ./zpool create -o ashift=12 tank $alldevs
sudo ./zfs set compression=off tank
sudo ./zfs set recordsize=$recordsize tank
sudo ./zfs set atime=off tank
sudo ./zpool get ashift tank

fio --name=fiotest --filename=/tank/test1 --size=50Gb --rw=write --bs=$recordsize --direct=0 --numjobs=$jobs --ioengine=libaio --iodepth=128 --group_reporting --runtime=20 --startdelay=1

sudo ./zpool destroy tank
sudo rmmod brd

Include any warning/errors/backtraces from the system logs

Example 1:

[  716.288390] list_del corruption. next->prev should be ffff9149749da200, but was 0b9fb17ccc1cb794
[  716.297199] ------------[ cut here ]------------
[  716.301815] kernel BUG at lib/list_debug.c:56!
[  716.306263] invalid opcode: 0000 [#1] SMP NOPTI
[  716.310794] CPU: 30 PID: 416234 Comm: dp_sync_taskq Tainted: P        W  OE  X  -------- -  - 4.18.0-553.16.1.1toss.t4.x86_64 #1
[  716.322339] Hardware name: Viking Enterprise Solutions VSSEP1EC/VSSEP1EC, BIOS RWH3LJ-10.09.04 05/12/2023
[  716.331891] RIP: 0010:__list_del_entry_valid.cold.1+0x20/0x48
[  716.337635] Code: 7b 74 a6 e8 ac 3f c6 ff 0f 0b 48 89 fe 48 89 c2 48 c7 c7 18 7c 74 a6 e8 98 3f c6 ff 0f 0b 48 c7 c7 c8 7c 74 a6 e8 8a 3f c6 ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 88 7c 74 a6 e8 76 3f c6 ff 0f 0b
[  716.356372] RSP: 0018:ffffa50dce4abb60 EFLAGS: 00010246
[  716.361590] RAX: 0000000000000054 RBX: ffff9143d99fb500 RCX: 0000000000000000
[  716.368717] RDX: 0000000000000000 RSI: ffff917deef9e698 RDI: ffff917deef9e698
[  716.375847] RBP: ffff9143d99fb508 R08: 0000000000000000 R09: c0000000ffff7fff
[  716.382973] R10: 0000000000000001 R11: ffffa50dce4ab980 R12: dead000000000200
[  716.390097] R13: dead000000000100 R14: ffff9149749da200 R15: ffff9149749da200
[  716.397220] FS:  0000000000000000(0000) GS:ffff917deef80000(0000) knlGS:0000000000000000
[  716.405296] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  716.411034] CR2: 00001554e9743ff8 CR3: 0000000116136000 CR4: 0000000000350ee0
[  716.418158] Call Trace:
[  716.420606]  ? __die_body+0x1a/0x60
[  716.424096]  ? die+0x2a/0x50
[  716.426981]  ? do_trap+0xe5/0x110
[  716.430293]  ? __list_del_entry_valid.cold.1+0x20/0x48
[  716.435422]  ? do_invalid_op+0x36/0x40
[  716.439168]  ? __list_del_entry_valid.cold.1+0x20/0x48
[  716.444299]  ? invalid_op+0x14/0x20
[  716.447792]  ? __list_del_entry_valid.cold.1+0x20/0x48
[  716.452934]  dbuf_sync_list+0x5e/0x120 [zfs]
[  716.457527]  dbuf_sync_indirect+0xe9/0x180 [zfs]
[  716.462229]  ? zio_nowait+0xc0/0x170 [zfs]
[  716.466433]  dbuf_sync_list+0xad/0x120 [zfs]
[  716.470793]  dbuf_sync_indirect+0xe9/0x180 [zfs]
[  716.475498]  ? __mutex_lock.isra.11+0x1d1/0x4c0
[  716.480032]  dbuf_sync_list+0xad/0x120 [zfs]
[  716.484391]  dnode_sync+0x377/0xa70 [zfs]
[  716.488507]  ? _cond_resched+0x15/0x30
[  716.492258]  ? __mutex_lock.isra.11+0x1d1/0x4c0
[  716.496785]  ? call_function_single_interrupt+0xa/0x20
[  716.501925]  ? _cond_resched+0x15/0x30
[  716.505677]  sync_dnodes_task+0x92/0x1a0 [zfs]
[  716.510219]  taskq_thread+0x33e/0x6f0 [spl]
[  716.514413]  ? wake_up_q+0x60/0x60
[  716.517819]  ? dnode_rele_task+0x70/0x70 [zfs]
[  716.522360]  ? taskq_thread_spawn+0x60/0x60 [spl]
[  716.527065]  kthread+0x14c/0x170
[  716.530298]  ? set_kthread_struct+0x50/0x50
[  716.534475]  ret_from_fork+0x35/0x40

Example 2:

[  482.358117] BUG: unable to handle kernel paging request at 0000000002474660
[  482.403133] PGD 0 P4D 0 
[  482.405671] Oops: 0000 [#2] SMP NOPTI
[  482.427711] CPU: 3 PID: 378357 Comm: dp_sync_taskq Tainted: P      D W  OE  X  -------- -  - 4.18.0-553.16.1.1toss.t4.x86_64 #1
[  484.184920] RIP: 0033:0x7ffff7dda57f
[  484.186560] Hardware name: Viking Enterprise Solutions VSSEP1EC/VSSEP1EC, BIOS RWH3LJ-10.09.04 05/12/2023
[  484.190129] Code: c4 e9 8f 00 00 00 0f 1f 44 00 00 8b b3 f4 02 00 00 85 f6 74 73 48 8b 43 70 c7 44 24 64 00 00 00 00 48 c7 44 24 68 00 00 00 00 <48> 8b 40 08 48 89 44 24 10 48 8b 43 68 48 8b 40 08 48 89 44 24 30
[  484.199682] RIP: 0010:arc_released+0x11/0x30 [zfs]
[  484.218418] RSP: 002b:00007fffffffe630 EFLAGS: 00010206
[  484.223202] Code: c7 c7 10 6f d9 c1 e8 de bf fe ff e9 32 f8 ff ff 66 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 31 c0 48 83 7f 10 00 74 11 48 8b 07 <48> 81 78 60 c0 03 f1 c1 0f 94 c0 0f b6 c0 e9 1c db 39 cf 66 66 2e
[  484.228421] RAX: 0000555555766b38 RBX: 00007ffff7ffe230 RCX: 0000555555554c00
[  484.247158] RSP: 0018:ffffab418e4cba50 EFLAGS: 00010206
[  484.254281] RDX: 00007fffffffe750 RSI: 0000000000000003 RDI: 00005555555552ad
[  484.254283] RBP: 0000000000000000 R08: 00007fffffffe760 R09: 00007ffff7ffe4f0
[  484.259500] 
[  484.266623] R10: 00007ffff7fcba90 R11: 00007ffff7fca000 R12: 0000000000000007
[  484.273747] RAX: 0000000002474600 RBX: ffff8ba276248000 RCX: 0000000000000000
[  484.275239] R13: 0000000000000001 R14: 00007ffff7fcba20 R15: 00005555555552ad
[  484.282363] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8b9bad6a56c0
[  484.289487] FS:  00007ffff7fc9040(0000) GS:ffff8bd9eeec0000(0000) knlGS:0000000000000000
[  484.296610] RBP: ffff8ba382162800 R08: 0000000000000013 R09: 0000000000000002
[  484.303733] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  484.311813] R10: 0000000000000000 R11: 0000000000000007 R12: ffff8b9bad6a56c0
[  484.318935] CR2: 0000555555766b40 CR3: 000000049d5b8000 CR4: 0000000000350ee0
[  484.324674] R13: ffff8b9ed5e9dfe0 R14: ffff8b9f56285200 R15: ffff8b9ed5e99c00
[  484.346047] FS:  0000000000000000(0000) GS:ffff8bd9ee8c0000(0000) knlGS:0000000000000000
[  484.354132] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  484.359870] CR2: 0000000002474660 CR3: 0000000161fc4000 CR4: 0000000000350ee0
[  484.366993] Call Trace:
[  484.369444]  ? __die_body+0x1a/0x60
[  484.372938]  ? no_context+0x1c0/0x3f0
[  484.376605]  ? remove_reference+0x1b0/0x1b0 [zfs]
[  484.381443]  ? __bad_area_nosemaphore+0x157/0x180
[  484.386148]  ? dmu_buf_unlock_parent+0xc0/0xc0 [zfs]
[  484.391254]  ? do_page_fault+0x37/0x13f
[  484.395090]  ? page_fault+0x1e/0x30
[  484.398585]  ? arc_released+0x11/0x30 [zfs]
[  484.402901]  dbuf_write+0x219/0x710 [zfs]
[  484.407046]  dbuf_sync_leaf+0x190/0x890 [zfs]
[  484.411544]  dbuf_sync_list+0x89/0x100 [zfs]
[  484.415946]  dbuf_sync_indirect+0x181/0x580 [zfs]
[  484.420779]  dbuf_sync_list+0x69/0x100 [zfs]
[  484.425174]  dbuf_sync_indirect+0x181/0x580 [zfs]
[  484.430010]  ? __raw_spin_unlock+0x5/0x20 [zfs]
[  484.434680]  ? dmu_objset_userquota_get_ids+0x476/0x6c0 [zfs]
[  484.440566]  dbuf_sync_list+0x69/0x100 [zfs]
[  484.444970]  dnode_sync+0x5f5/0xf50 [zfs]
[  484.449132]  ? __list_add+0x12/0x30 [zfs]
[  484.453308]  dmu_objset_sync_dnodes+0x91/0x130 [zfs]
[  484.458420]  sync_dnodes_task+0x40/0x1e0 [zfs]
[  484.462997]  taskq_thread+0x294/0x5e0 [spl]
[  484.467210]  ? wake_up_q+0x60/0x60
[  484.470616]  ? dnode_rele_task+0x70/0x70 [zfs]
[  484.475197]  ? taskq_lowest_id+0xc0/0xc0 [spl]
[  484.479644]  kthread+0x14c/0x170
[  484.482878]  ? set_kthread_struct+0x50/0x50
[  484.487062]  ret_from_fork+0x35/0x40
@tonyhutter tonyhutter added the Type: Defect Incorrect behavior (e.g. crash, hang) label Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

1 participant