Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kernel panic after upgrade to 24.1 with HA state synchronization enabled #204

Open
2 tasks done
Kishi85 opened this issue Apr 29, 2024 · 1 comment
Open
2 tasks done

Comments

@Kishi85
Copy link

Kishi85 commented Apr 29, 2024

Important notices

Before you add a new report, we ask you kindly to acknowledge the following:

Describe the bug

After upgrading the passive/backup primary node of the FW cluster from 23.7 to 24.1 (secondary being upgraded before). It panics upon starting the interfaces (seems to occur right on the Cluster interface specifically) with the following stack trace (cropped due to serial terminal limits but a full crash report was submitted using the WebUI after working around the issue):

lo0: link state changed to UP                                                                                                                                                                                                                                        [317/15483]
[fib_algo] inet.0 (bsearch4#32) rebuild_fd_flm: switching algo to radix4_lockless                                                                                                                                                                                               
Sleeping thread (tid 100538, pid 95063) owns a non-sleepable lock                                                                                                                                                                                                               
KDB: stack backtrace of thread 100538:                                                                                                                                                                                                                                          
sched_switch() at sched_switch+0x818/frame 0xfffffe0247de3a10                                                                                                                                                                                                                   
mi_switch() at mi_switch+0xc2/frame 0xfffffe0247de3a30                                                                                                                                                                                                                          
_sx_xlock_hard() at _sx_xlock_hard+0x3e4/frame 0xfffffe0247de3ae0                                                                                                                                                                                                               
in_leavegroup() at in_leavegroup+0x80/frame 0xfffffe0247de3b10                                                                                                                                                                                                                  
pfsync_multicast_cleanup() at pfsync_multicast_cleanup+0x2b/frame 0xfffffe0247de3b40                                                                                                                                                                                            
pfsyncioctl() at pfsyncioctl+0x6fd/frame 0xfffffe0247de3bc0                                                                                                                                                                                                                     
ifioctl() at ifioctl+0x7bc/frame 0xfffffe0247de3cc0                                                                                                                                                                                                                             
kern_ioctl() at kern_ioctl+0x26d/frame 0xfffffe0247de3d30                                                                                                                                                                                                                       
sys_ioctl() at sys_ioctl+0x100/frame 0xfffffe0247de3e00                                                                                                                                                                                                                         
amd64_syscall() at amd64_syscall+0x10c/frame 0xfffffe0247de3f30                                                                                                                                                                                                                 
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe0247de3f30                                                                                                                                                                                                      
--- syscall (54, FreeBSD ELF64, ioctl), rip = 0x17204d3321ca, rsp = 0x17204a309e78, rbp = 0x17204a309ec0 ---                                                                                                                                                                    
panic: sleeping thread                                                                                                                                                                                                                                                          
cpuid = 6                                                                                                                                                                                                                                                                       
time = 1714383055                                                                                                                                                                                                                                                               
KDB: stack backtrace:                                                                                                                                                                                                                                                           
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe02137c7980                                                                                                                                                                                                  
vpanic() at vpanic+0x151/frame 0xfffffe02137c79d0                                                                                                                                                                                                                               
panic() at panic+0x43/frame 0xfffffe02137c7a30                                                                                                                                                                                                                                  
propagate_priority() at propagate_priority+0x296/frame 0xfffffe02137c7a70                                                                                                                                                                                                       
turnstile_wait() at turnstile_wait+0x323/frame 0xfffffe02137c7ab0                                                                                                                                                                                                               
__mtx_lock_sleep() at __mtx_lock_sleep+0x180/frame 0xfffffe02137c7b40                                                                                                                                                                                                           
pfsyncioctl() at pfsyncioctl+0x91b/frame 0xfffffe02137c7bc0                                                                                                                                                                                                                     
ifioctl() at ifioctl+0x803/frame 0xfffffe02137c7cc0                                                                                                                                                                                                                             
kern_ioctl() at kern_ioctl+0x26d/frame 0xfffffe02137c7d30                                                                                                                                                                                                                       
sys_ioctl() at sys_ioctl+0x100/frame 0xfffffe02137c7e00                                                                                                                                                                                                                         
amd64_syscall() at amd64_syscall+0x10c/frame 0xfffffe02137c7f30                                                                                                                                                                                                                 
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfffffe02137c7f30                                                                                                                                                                                                      
--- syscall (54, FreeBSD ELF64, ioctl), rip = 0x2e515b92d1ca, rsp = 0x2e5157537fc8, rbp = 0x2e5157538000 ---                                                                                                                                                                    
KDB: enter: panic                                                                                                                                                                                                                                                               
[ thread pid 98568 tid 100537 ]                                                                                                                                                                                                                                                 
Stopped at      kdb_enter+0x37: movq    $0,0x1217e0e(%rip)                                                                                                                                                                                                                      
db:0:kdb.enter.default> textdump set                                                                                                                                                                                                                                            
textdump set                                                                                                                                                                                                                                                                    
db:0:kdb.enter.default>  capture on                                                                                                                                                                                                                                             
db:0:kdb.enter.default>  run lockinfo                                                                                                                                                                                                                                           
db:1:lockinfo> show locks                                                                                                                                                                                                                                                       
No such command; use "help" to list available commands                                                                                                                                                                                                                          
db:1:lockinfo>  show alllocks                                                                                                                                                                                                                                                   
No such command; use "help" to list available commands                                                                                                                                                                                                                          
db:1:lockinfo>  show lockedvnods                                                                                                                                                                                                                                                
Locked vnodes                                                                                                                                                                                                                                                                   
db:0:kdb.enter.default>  show pcpu                                                                                                                                                                                                                                              
cpuid        = 6                                                                                                                                                                                                                                                                
dynamic pcpu = 0xfffffe0154d6e300                                                                                                                                                                                                                                               
curthread    = 0xfffffe0214869740: pid 98568 tid 100537 critnest 1 "ifconfig"                                                                                                                                                                                                   
curpcb       = 0xfffffe0214869c50                                                                                                                                                                                                                                               
fpcurthread  = 0xfffffe0214869740: pid 98568 "ifconfig"                                                                                                                                                                                                                         
idlethread   = 0xfffffe017e889c80: tid 100009 "idle: cpu6"                                                                                                                                                                                                                      
self         = 0xffffffff82e16000                                                                                                                                                                                                                                               
curpmap      = 0xfffffe026506ab20                                                                                                                                                                                                                                               
tssp         = 0xffffffff82e16384                                                                                                                                                                                                                                               
rsp0         = 0xfffffe02137c8000                                                                                                                                                                                                                                               
kcr3         = 0x241bd8000                                                                                                                                                                                                                                                      
ucr3         = 0x241a2b000                                                                                                                                                                                                                                                      
scr3         = 0x241a2b000                                                                                                                                                                                                                                                      
gs32p        = 0xffffffff82e16404                                                                                                                                                                                                                                               
ldt          = 0xffffffff82e16444                                                                                                                                                                                                                                               
tss          = 0xffffffff82e16434                                                                                                                                                                                                                                               
curvnet      = 0xfffff80101648c40                                                                                                                                                                                                                                               
db:0:kdb.enter.default>  bt                                                                                                                                                                                                                                                     
Tracing pid 98568 tid 100537 td 0xfffffe0214869740                                                                                                                                                                                                                              
kdb_enter() at kdb_enter+0x37/frame 0xfffffe02137c7980                                                                                                                                                                                                                          
vpanic() at vpanic+0x182/frame 0xfffffe02137c79d0                                                                                                                                                                                                                               
panic() at panic+0x43/frame 0xfffffe02137c7a30                                                                                                                                                                                                                                  
propagate_priority() at propagate_priority+0x296/frame 0xfffffe02137c7a70                                                                                                                                                                                                       
turnstile_wait() at turnstile_wait+0x323/frame 0xfffffe02137c7ab0 

To Reproduce

Steps to reproduce the behavior:

  1. Upgrade secondary node from 23.7 to 24.1
  2. Switch over active/master to secondary node
  3. Upgrade primary node from 23.7 to 24.1 and let it reboot

Expected behavior

Primary node should update and reboot without issues with HA state synchronization enabled

Describe alternatives you considered

After disabling HA state synchronization on the secondary the primary node boots properly without problems.
Failover is not smooth due to states getting lost but works for now.

Relevant log files
See stack trace above. Full crash report was submitted after boot succeeded using Firmware/Reporter.

Environment

Software version used and hardware type if relevant, e.g.:

OPNsense 24.1.6-amd64
FreeBSD 13.2-RELEASE-p11
OpenSSL 3.0.13

directly on Dell PowerEdge R6515 with 4x Broadcom Adv. Dual 25Gb Ethernet (everything on latest available firmware)

@Kishi85
Copy link
Author

Kishi85 commented Aug 5, 2024

This is also happening on upgrading to kernel 24.1.8 but I've managed to work around it by setting the respective other firewall node as a the unicast sync target IP using the UI and then the boot loop stopped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant