Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel Panic Mode when sctp goes via a interface and IPSec #227

Open
2 tasks done
snorlaxrino opened this issue Nov 12, 2024 · 9 comments
Open
2 tasks done

Kernel Panic Mode when sctp goes via a interface and IPSec #227

snorlaxrino opened this issue Nov 12, 2024 · 9 comments
Labels
upstream Third party issue

Comments

@snorlaxrino
Copy link

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

opnsense crashes, it seems to have something to do with sctp and a vpn. After some time we suspect that an object might be null or incorrectly filled -> https://github.com/opnsense/src/blob/stable/24.7/sys/netpfil/pf/pf.c#L7944
But it seems to occur only in combination with VPN and sctp, I had 2 test IPSec site to site and OpenVPN TAP, with both VPNs the problem occurred at the same place.
Only when the VPN is deactivated does the error not occur.
Didn't use this before 24.7.

To Reproduce

these sctp packets go through an IPSec tunnel, as soon as I activate the tunnel the OPNsense crashes. After a restart, the OPNsense runs for about 15 minutes until it crashes again. The VPN is site to site.

Expected behavior

No kernel panic mode in this case.

Relevant log files

--- trap 0xc, rip = 0xffffffff821ab744, rsp = 0xfffffe00625cef50, rbp = 0xfffffe00625cef50 ---
pfi_kkif_match() at pfi_kkif_match+0x24/frame 0xfffffe00625cef50
pf_test_rule() at pf_test_rule+0xe6b/frame 0xfffffe00625cf3a0
pf_sctp_multihome_delayed() at pf_sctp_multihome_delayed+0x30e/frame 0xfffffe00625cf4d0
pf_test() at pf_test+0xd1a/frame 0xfffffe00625cf680
pf_check_in() at pf_check_in+0x27/frame 0xfffffe00625cf6a0
pfil_mbuf_in() at pfil_mbuf_in+0x38/frame 0xfffffe00625cf6d0
enc_hhook() at enc_hhook+0x28a/frame 0xfffffe00625cf710
hhook_run_hooks() at hhook_run_hooks+0x61/frame 0xfffffe00625cf780
ipsec_run_hhooks() at ipsec_run_hhooks+0x6d/frame 0xfffffe00625cf7a0
ipsec4_common_input_cb() at ipsec4_common_input_cb+0x32a/frame 0xfffffe00625cf830
esp_input_cb() at esp_input_cb+0x430/frame 0xfffffe00625cf8e0
swcr_process() at swcr_process+0x25/frame 0xfffffe00625cf900
crypto_dispatch() at crypto_dispatch+0x60/frame 0xfffffe00625cf920
esp_input() at esp_input+0x4d8/frame 0xfffffe00625cf9f0
udp_ipsec_input() at udp_ipsec_input+0x17b/frame 0xfffffe00625cfa50
ipsec_kmod_udp_input() at ipsec_kmod_udp_input+0x2d/frame 0xfffffe00625cfa70
udp_append() at udp_append+0xe4/frame 0xfffffe00625cfae0
udp_input() at udp_input+0x803/frame 0xfffffe00625cfbc0
ip_input() at ip_input+0x268/frame 0xfffffe00625cfc20
netisr_dispatch_src() at netisr_dispatch_src+0x9e/frame 0xfffffe00625cfc70
ether_demux() at ether_demux+0x149/frame 0xfffffe00625cfca0
ether_nh_input() at ether_nh_input+0x36a/frame 0xfffffe00625cfd00
netisr_dispatch_src() at netisr_dispatch_src+0x9e/frame 0xfffffe00625cfd50
ether_input() at ether_input+0x56/frame 0xfffffe00625cfda0
re_rxeof() at re_rxeof+0x547/frame 0xfffffe00625cfe20
re_intr_msi() at re_intr_msi+0xf3/frame 0xfffffe00625cfe60
ithread_loop() at ithread_loop+0x257/frame 0xfffffe00625cfef0
fork_exit() at fork_exit+0x7f/frame 0xfffffe00625cff30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe00625cff30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic
panic.txt0600001214714623512 7136 ustarrootwheelpage faultversion.txt0600007414714623512 7541 ustarrootwheelFreeBSD 14.1-RELEASE-p6 stable/24.7-n267939-fd5bc7f34e1 SMP

Additional context

Uploaded through crash reporter

Environment

OPNsense 24.7.8-amd64
FreeBSD 14.1-RELEASE-p6
OpenSSL 3.0.15
AMD G-T40E Processor (2 cores, 2 threads)

@fichtner fichtner added the upstream Third party issue label Nov 12, 2024
@fichtner
Copy link
Member

Feel free to send me a vmcore file from a debug kernel crash:

# opnsense-update -zkr dbg-24.7.8 && opnsense-shell reboot

That being said SCTP being unreliable is clear FreeBSD territory. There are no relevant commits on stable/14 to my knowledge.

Cheers,
Franco

@snorlaxrino
Copy link
Author

Hey Franco,
the vmcore0 is to big to upload here.
I can upload it somewhere if you have something available, otherwise I can share it via Onedrive.
Here is an extract that may help.
Or you can also tell me exactly what information you need.

Fatal trap 12: page fault while in kernel mode
cpuid = 1; apic id = 01
fault virtual address = 0x18
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff82388a0d
stack pointer = 0x28:0xfffffe006259ae80
frame pointer = 0x28:0xfffffe006259ae90
code segment = base 0x0, limit 0xfffff, type 0x1b
= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 12 (irq28: re1)
rdi: ffffffff81ddc480 rsi: 0000000000000000 rdx: fffff80034308078
rcx: 0000000000000000 r8: 00000000ffffffdb r9: 0000000000000010
rax: 0000000000000001 rbx: fffff80005fe2600 rbp: fffffe006259ae90
r10: 0000000000000000 r11: 0000000000000000 r12: fffff80005b1ee00
r13: fffff8009eb4eb10 r14: fffff80005b1ee00 r15: fffff800035b7740
trap number = 12
panic: page fault
cpuid = 1
time = 1731418088
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe006259ab70
vpanic() at vpanic+0x131/frame 0xfffffe006259aca0
panic() at panic+0x43/frame 0xfffffe006259ad00
trap_fatal() at trap_fatal+0x40b/frame 0xfffffe006259ad60
trap_pfault() at trap_pfault+0x57/frame 0xfffffe006259adb0
calltrap() at calltrap+0x8/frame 0xfffffe006259adb0
--- trap 0xc, rip = 0xffffffff82388a0d, rsp = 0xfffffe006259ae80, rbp = 0xfffffe006259ae90 ---
pfi_kkif_match() at pfi_kkif_match+0x3d/frame 0xfffffe006259ae90
pf_test_rule() at pf_test_rule+0xe43/frame 0xfffffe006259b2d0
pf_sctp_multihome_delayed() at pf_sctp_multihome_delayed+0x314/frame 0xfffffe006259b400
pf_test() at pf_test+0x10f9/frame 0xfffffe006259b5b0
pf_check_in() at pf_check_in+0x27/frame 0xfffffe006259b5d0
pfil_mbuf_in() at pfil_mbuf_in+0x58/frame 0xfffffe006259b610
enc_hhook() at enc_hhook+0x28a/frame 0xfffffe006259b650
hhook_run_hooks() at hhook_run_hooks+0x6f/frame 0xfffffe006259b6c0
ipsec_run_hhooks() at ipsec_run_hhooks+0x6d/frame 0xfffffe006259b6e0
ipsec4_common_input_cb() at ipsec4_common_input_cb+0x3e4/frame 0xfffffe006259b770
esp_input_cb() at esp_input_cb+0x5bd/frame 0xfffffe006259b830
swcr_process() at swcr_process+0x25/frame 0xfffffe006259b850
crypto_invoke() at crypto_invoke+0x7c/frame 0xfffffe006259b8c0
crypto_dispatch_one() at crypto_dispatch_one+0xf4/frame 0xfffffe006259b8f0
esp_input() at esp_input+0x57e/frame 0xfffffe006259b9c0
udp_ipsec_input() at udp_ipsec_input+0x197/frame 0xfffffe006259ba20
ipsec_kmod_udp_input() at ipsec_kmod_udp_input+0x2d/frame 0xfffffe006259ba40
udp_append() at udp_append+0x112/frame 0xfffffe006259bab0
udp_input() at udp_input+0x823/frame 0xfffffe006259bba0
ip_input() at ip_input+0x2e0/frame 0xfffffe006259bc00
netisr_dispatch_src() at netisr_dispatch_src+0xae/frame 0xfffffe006259bc60
ether_demux() at ether_demux+0x179/frame 0xfffffe006259bc90
ether_nh_input() at ether_nh_input+0x3e9/frame 0xfffffe006259bce0
netisr_dispatch_src() at netisr_dispatch_src+0xae/frame 0xfffffe006259bd40
ether_input() at ether_input+0x155/frame 0xfffffe006259bda0
re_rxeof() at re_rxeof+0x575/frame 0xfffffe006259be20
re_intr_msi() at re_intr_msi+0xc3/frame 0xfffffe006259be60
ithread_loop() at ithread_loop+0x256/frame 0xfffffe006259bef0
fork_exit() at fork_exit+0x82/frame 0xfffffe006259bf30
fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe006259bf30
--- trap 0, rip = 0, rsp = 0, rbp = 0 ---
KDB: enter: panic

Best Regard
Richi

@fichtner
Copy link
Member

Hey Richi,

Yeah, the vmcore is last resort. Can you share via onedrive, just drop me a line at [email protected] -- highly appreciated!

Cheers,
Franco

@snorlaxrino
Copy link
Author

Hello Franco,

just to make it sure, did you receive my link through mail?

Best regards,
Richi

@fichtner
Copy link
Member

Hi Richi,

Thanks for following up. Did not receive an email indeed. Can you try to resend?

Thanks,
Franco

@fichtner
Copy link
Member

Got it now, thanks!

@fichtner
Copy link
Member

Ok I think this is involved in the NULL dereference happening here:

38663ae5ccc2b83

If you set Firewall: Settings: Advanced: Bind states to interface -- do the crashes still occur?

Cheers,
Franco

@snorlaxrino
Copy link
Author

Hi Franco,
still crashes. I send you through mail new dumb.
Best Regards
Richi

@fichtner
Copy link
Member

Hi Richi,

Can you try this kernel? It is an immediate fix to the crash location but I'm not sure if the larger issue appears somewhere else afterwards:

# opnsense-update -zkr 24.7.8-sctp

Cheers,
Franco

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
upstream Third party issue
Development

No branches or pull requests

2 participants