Waiting for tunxxx to become free #18

bernhardschmidt · 2023-01-05T09:20:42Z

As far as I understand this is already known?

When stopping an OpenVPN instance, the kernel log is getting messages like


Jan  5 10:14:12 eduvpn-n09 kernel: [94308.032155] unregister_netdevice: waiting for tun321 to become free. Usage count = 1
Jan  5 10:14:12 eduvpn-n09 kernel: [94308.416149] unregister_netdevice: waiting for tun263 to become free. Usage count = 2

The text was updated successfully, but these errors were encountered:

ordex · 2023-01-05T09:38:13Z

Thanks for the report! It's probably the same as #17
However, it'd be great to have a way to reproduce it, because the message is just a generic indication that the interface is still referenced by "something".

bernhardschmidt · 2023-01-09T10:49:53Z

Just had an instance of #16 , where all processes died

Jan  9 11:43:47 eduvpn-n09 openvpn[601]: dco_do_read: netlink out of memory error: No buffer space available (errno=105)
Jan  9 11:43:47 eduvpn-n09 openvpn[607]: dco_do_read: netlink out of memory error: No buffer space available (errno=105)
Jan  9 11:43:47 eduvpn-n09 openvpn[606]: dco_do_read: netlink out of memory error: No buffer space available (errno=105)
Jan  9 11:43:47 eduvpn-n09 openvpn[608]: dco_do_read: netlink out of memory error: No buffer space available (errno=105)
Jan  9 11:43:47 eduvpn-n09 openvpn[600]: dco_do_read: netlink out of memory error: No buffer space available (errno=105)
Jan  9 11:43:47 eduvpn-n09 openvpn[603]: dco_do_read: netlink out of memory error: No buffer space available (errno=105)
Jan  9 11:43:47 eduvpn-n09 openvpn[602]: dco_do_read: netlink out of memory error: No buffer space available (errno=105)
Jan  9 11:43:47 eduvpn-n09 openvpn[604]: dco_do_read: netlink out of memory error: No buffer space available (errno=105)
Jan  9 11:43:47 eduvpn-n09 openvpn[605]: dco_do_read: netlink out of memory error: No buffer space available (errno=105)
Jan  9 11:43:47 eduvpn-n09 openvpn[599]: dco_do_read: netlink out of memory error: No buffer space available (errno=105)

Of the 10 running processes 8 were restarted by systemd just fine. Two of them are in an uninterruptable sleep (D), and those are the two processes where the warning is logged.

bernhardschmidt · 2023-01-09T10:51:28Z

These are the last messages of a broken server

Jan  9 11:43:47 eduvpn-n09 openvpn[600]: dco_do_read: netlink out of memory error: No buffer space available (errno=105)
Jan  9 11:43:47 eduvpn-n09 openvpn[600]: Exiting due to fatal error
Jan  9 11:43:47 eduvpn-n09 openvpn[600]: Closing DCO interface
Jan  9 11:43:47 eduvpn-n09 openvpn[600]: net_addr_v4_del: 129.187.98.129 dev tun321
Jan  9 11:43:47 eduvpn-n09 openvpn[600]: net_addr_v6_del: 2001:4ca0:2fff:2:3:1:1:1/112 dev tun321
Jan  9 11:43:47 eduvpn-n09 openvpn[600]: net_iface_del: delete tun321

ordex · 2023-03-24T22:35:14Z

It's very possible that this issue was solved by the latest changes we implemented.
That reworked the control packets flow and moved them away from netlink.

Any chance you could try to reproduce this one again with openvpn-2.6.2 and the latest master for ovpn-dco?

Thanks a lot

bernhardschmidt · 2023-03-31T12:35:50Z

This is still happening with 2.6.2 and the new DCO

[287938.551622] unregister_netdevice: waiting for tun321 to become free. Usage count = 23
[287938.839851] unregister_netdevice: waiting for tun263 to become free. Usage count = 64

Happened on openvpn stop, process is in D state and unkillable

nobody       602  0.2  2.8 115712 112232 ?       Ds   Mar28  12:23 /usr/sbin/openvpn --status /run/openvpn-server/status-tum-full-node-03-1.log --status-version 2 --suppress-timestamps --config tum-full-node-03-1.conf
nobody       610  0.2  1.9  79528 76144 ?        Ds   Mar28  12:43 /usr/sbin/openvpn --status /run/openvpn-server/status-tum-split-node-03-7.log --status-version 2 --suppress-timestamps --config tum-split-node-03-7.conf

ordex · 2023-03-31T12:40:47Z

@bernhardschmidt thanks for reporting. The process is probably stuck on the iface-delete call, which won't return, because the networking stack is busy waiting for tunxxx to be released. Which probably is not going to happen.
If you don't stop OpenVPN it won't happen because there would be no hanging call.

This is due to some refcounting unbalance...

Did it happen with TCP or UDP?

bernhardschmidt · 2023-03-31T12:55:07Z

Good catch, the two hanging processes (out of 10) are the ones listening to TCP.

hildeb · 2023-04-25T07:51:06Z

I'm also encountering the issue with the hanging interface.
An TCP instance of 2.6.3 was being stopped:

...
Apr 25 08:36:10 openvpn-gw162-int openvpn-tcp[1623]: dco_parse_peer_multi: cannot store DCO stats for peer 32
Apr 25 08:36:13 openvpn-gw162-int openvpn-tcp[1623]: Closing DCO interface
Apr 25 08:36:13 openvpn-gw162-int openvpn-tcp[1623]: net_addr_v4_del: 172.29.136.1 dev tun1
Apr 25 08:36:13 openvpn-gw162-int openvpn-tcp[1623]: net_iface_del: delete tun1

after that systemd was complaining:

Apr 25 08:37:40 openvpn-gw162-int systemd[1]: [email protected]: State 'stop-sigterm' timed out. Killing.
Apr 25 08:37:40 openvpn-gw162-int systemd[1]: [email protected]: Killing process 1623 (openvpn) with signal SIGKILL.
Apr 25 08:39:10 openvpn-gw162-int systemd[1]: [email protected]: Processes still around after SIGKILL. Ignoring.
Apr 25 08:40:40 openvpn-gw162-int systemd[1]: [email protected]: State 'stop-final-sigterm' timed out. Killing.
Apr 25 08:40:40 openvpn-gw162-int systemd[1]: [email protected]: Killing process 1623 (openvpn) with signal SIGKILL.
Apr 25 08:42:10 openvpn-gw162-int systemd[1]: [email protected]: Processes still around after final SIGKILL. Entering failed mode.
Apr 25 08:42:10 openvpn-gw162-int systemd[1]: [email protected]: Failed with result 'timeout'.
Apr 25 08:42:10 openvpn-gw162-int systemd[1]: [email protected]: Found left-over process 1623 (openvpn) in control group while starting unit. Ignoring.
Apr 25 08:42:11 openvpn-gw162-int systemd[1]: [email protected]: Found left-over process 1623 (openvpn) in control group while starting unit. Ignoring.

Subsequently, I wasn't able to start, since:

Apr 25 08:42:11 openvpn-gw162-int openvpn-tcp[69606]: MANAGEMENT: Socket bind failed on local address [AF_INET]127.0.0.1:7506: Address already in use (errno=98)

Looking at kern.log, I found:

Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.124095] tun1: deleting peer with id 23, reason 1
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.124118] tun1: deleting peer with id 22, reason 1
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.124678] BUG: kernel NULL pointer dereference, address: 000000000000021c
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.124872] #PF: supervisor read access in kernel mode
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.124963] #PF: error_code(0x0000) - not-present page
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.125058] PGD 2b4d0067 P4D 2b4d0067 PUD 2b381067 PMD 0 
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.125164] Oops: 0000 [#1] SMP NOPTI
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.125271] CPU: 0 PID: 53946 Comm: kworker/0:1 Tainted: G        W  OE     5.15.0-58-generic #64~20.04.1-Ubuntu
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.125586] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 12/12/2018
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.125906] Workqueue: ovpn-crypto-wq-tun1 ovpn_decrypt_work [ovpn_dco_v2]
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.126090] RIP: 0010:ovpn_decrypt_work+0x243/0x610 [ovpn_dco_v2]
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.126279] Code: 48 8b 35 80 b6 1f ea 89 c0 48 8d bb f8 05 00 00 48 01 c6 e8 cf 4d 56 e8 48 8b 83 10 04 00 00 48 8b 80 00 01 00 00 48 8b 
40 18 <66> 83 b8 1c 02 00 00 11 0f 84 fe 00 00 00 41 8b 44 24 70 f0 48 01
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.126934] RSP: 0018:ffffb7a280c43e20 EFLAGS: 00010286
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.127181] RAX: 0000000000000000 RBX: ffff9ece590a1880 RCX: 00000000000000d4
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.127443] RDX: ffffb7a280c43dd8 RSI: 0000000000000286 RDI: ffff9ecf3bc207c0
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.127695] RBP: ffffb7a280c43e58 R08: 0000000000000000 R09: 0000000000000000
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.127959] R10: ffff9ece2a6c0840 R11: 0000000000000010 R12: ffff9ecf18b46b00
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.128236] R13: ffff9ece590a19c8 R14: ffff9ece044def90 R15: 0000000000000000
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.128506] FS:  0000000000000000(0000) GS:ffff9ecf3bc00000(0000) knlGS:0000000000000000
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.128786] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.129078] CR2: 000000000000021c CR3: 00000000254ba004 CR4: 00000000007706f0
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.129401] PKRU: 55555554
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.129709] Call Trace:
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.130009]  <TASK>
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.130311]  process_one_work+0x228/0x3d0
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.130643]  worker_thread+0x4d/0x3f0
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.130969]  ? process_one_work+0x3d0/0x3d0
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.131305]  kthread+0x127/0x150
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.131639]  ? set_kthread_struct+0x50/0x50
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.131984]  ret_from_fork+0x1f/0x30
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.132384]  </TASK>
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.132719] Modules linked in: sctp ovpn_dco_v2(OE) ip6_udp_tunnel udp_tunnel tcp_diag inet_diag nfnetlink_log vsock_loopback vmw_vsock_vi
rtio_transport_common vmw_vsock_vmci_transport vsock dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua nft_reject_ipv4 nft_counter binfmt_misc crct10dif_pclmul vmw_balloon crc32_pclmul ghash_clmulni_intel nf_log_syslog ahci vmw_vmci joydev vmxnet3 libahci nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_log nft_ct nf_tables nfnetlink sch_fq_codel msr nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 efi_pstore ip_tables x_tables autofs4 raid10 raid456 libcrc32c async_raid6_recov async_memcpy async_pq async_xor xor async_tx raid6_pq raid1 raid0 multipath linear vmwgfx ttm input_leds drm_kms_helper aesni_intel syscopyarea sysfillrect mptspi sysimgblt scsi_transport_spi fb_sys_fops cec crypto_simd mptscsih rc_core psmouse cryptd serio_raw drm mptbase
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.136954] CR2: 000000000000021c
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.137397] ---[ end trace a021367094290bf0 ]---
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.137835] RIP: 0010:ovpn_decrypt_work+0x243/0x610 [ovpn_dco_v2]
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.138269] Code: 48 8b 35 80 b6 1f ea 89 c0 48 8d bb f8 05 00 00 48 01 c6 e8 cf 4d 56 e8 48 8b 83 10 04 00 00 48 8b 80 00 01 00 00 48 8b 40 18 <66> 83 b8 1c 02 00 00 11 0f 84 fe 00 00 00 41 8b 44 24 70 f0 48 01
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.139553] RSP: 0018:ffffb7a280c43e20 EFLAGS: 00010286
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.139972] RAX: 0000000000000000 RBX: ffff9ece590a1880 RCX: 00000000000000d4
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.140388] RDX: ffffb7a280c43dd8 RSI: 0000000000000286 RDI: ffff9ecf3bc207c0
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.140814] RBP: ffffb7a280c43e58 R08: 0000000000000000 R09: 0000000000000000
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.141229] R10: ffff9ece2a6c0840 R11: 0000000000000010 R12: ffff9ecf18b46b00
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.141642] R13: ffff9ece590a19c8 R14: ffff9ece044def90 R15: 0000000000000000
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.142073] FS:  0000000000000000(0000) GS:ffff9ecf3bc00000(0000) knlGS:0000000000000000
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.142511] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.142922] CR2: 000000000000021c CR3: 00000000254ba004 CR4: 00000000007706f0
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.143384] PKRU: 55555554
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.143821] tun1: deleting peer with id 21, reason 1
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.144304] tun1: deleting peer with id 19, reason 1
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.144988] tun1: deleting peer with id 18, reason 1
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.145522] tun1: deleting peer with id 17, reason 1
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.145965] tun1: deleting peer with id 16, reason 1
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.146411] tun1: deleting peer with id 15, reason 1
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.146839] tun1: deleting peer with id 13, reason 1
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.147245] tun1: deleting peer with id 12, reason 1
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.147664] tun1: deleting peer with id 11, reason 1
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.148192] tun1: deleting peer with id 0, reason 1
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.151984] tun1: deleting peer with id 10, reason 1
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.152477] tun1: deleting peer with id 9, reason 1
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.152868] tun1: deleting peer with id 7, reason 1
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.153264] tun1: deleting peer with id 6, reason 1
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.153605] tun1: deleting peer with id 5, reason 1
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.153940] tun1: deleting peer with id 4, reason 1
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.154260] tun1: deleting peer with id 2, reason 1
Apr 25 08:36:10 openvpn-gw162-int kernel: [43533.154568] tun1: deleting peer with id 1, reason 1
Apr 25 08:36:23 openvpn-gw162-int kernel: [43546.267778] unregister_netdevice: waiting for tun1 to become free. Usage count = 3
Apr 25 08:36:30 openvpn-gw162-int kernel: [43553.247650] tun0: deleting peer with id 69, reason 1
Apr 25 08:36:33 openvpn-gw162-int kernel: [43556.347620] unregister_netdevice: waiting for tun1 to become free. Usage count = 3
Apr 25 08:36:43 openvpn-gw162-int kernel: [43566.427435] unregister_netdevice: waiting for tun1 to become free. Usage count = 3
Apr 25 08:36:50 openvpn-gw162-int kernel: [43573.075332] tun0: deleting peer with id 77, reason 1
Apr 25 08:36:53 openvpn-gw162-int kernel: [43576.507298] unregister_netdevice: waiting for tun1 to become free. Usage count = 3
Apr 25 08:37:03 openvpn-gw162-int kernel: [43586.587123] unregister_netdevice: waiting for tun1 to become free. Usage count = 3
Apr 25 08:37:06 openvpn-gw162-int kernel: [43589.363069] tun0: deleting peer with id 79, reason 1
Apr 25 08:37:13 openvpn-gw162-int kernel: [43596.666984] unregister_netdevice: waiting for tun1 to become free. Usage count = 3
Apr 25 08:37:23 openvpn-gw162-int kernel: [43606.746784] unregister_netdevice: waiting for tun1 to become free. Usage count = 3
Apr 25 08:37:30 openvpn-gw162-int kernel: [43612.910705] tun0: deleting peer with id 70, reason 2
Apr 25 08:37:33 openvpn-gw162-int kernel: [43616.826641] unregister_netdevice: waiting for tun1 to become free. Usage count = 3
Apr 25 08:37:37 openvpn-gw162-int kernel: [43620.090577] tun0: deleting peer with id 82, reason 1
Apr 25 08:37:42 openvpn-gw162-int kernel: [43625.106540] tun0: deleting peer with id 84, reason 1
...

mr-liusg · 2023-05-13T12:50:17Z

I found a similar problem on my UDP server.
But it was hard to reproduce and I ran it for 2-3 days before the problem happened.

ordex · 2023-05-13T18:22:18Z

@mr-liusg this is nott he same issue. The messages you see are about incoming traffic for a peer that is not known.
If you feel this is a bug, please open another ticket.

mr-liusg · 2023-05-15T02:52:14Z

@mr-liusg this is nott he same issue. The messages you see are about incoming traffic for a peer that is not known. If you feel this is a bug, please open another ticket.

I looked at the logs in depth today. I believe the problem is similar to #13. I've posted my error stack over.

claw6148 · 2023-10-17T04:07:01Z

I can reproduce this issue in a stable way on my machine:

Run the ovpn server in tcp mode (ovpn: 2.6.6, dco: v0.2.20230426)
Connect to the ovpn server, and then perform the iperf3 upload test (from client to server)
Press Ctrl-C to exit the ovpn server during testing, and the problem reproduces

I added some dump_stack() and printk() in the DCO driver.
The dmesg logs below is a comparison of successful/failure exits of the ovpn server.
On failure scenario ovpn_peer_delete_work() is not executed.

dmesg logs

successful

[   93.195417] CPU: 2 PID: 3337 Comm: openvpn Not tainted 5.4.238 #0
[   93.201518] Stack : 8df4bc74 800804d0 80700000 807067d4 80770000 8070679c 80705498 8df4bb6c
[   93.209853]         808b0000 8fe0c2f0 80750d63 806860e4 00000002 00000001 8df4bb10 964647d0
[   93.218179]         00000000 00000000 808f0000 00000000 30232038 00000129 2e352064 33322e34
[   93.226500]         00000000 00000004 00000000 0002fb59 80000000 80770000 00000000 00000000
[   93.234821]         8f16d480 00000000 807683e0 8eabcd80 00000000 80369ec8 003d1a6f 003d93ef
[   93.243142]         ...
[   93.245577] Call Trace:
[   93.248047] [<8000b6cc>] show_stack+0x30/0x100
[   93.252512] [<805c7dcc>] dump_stack+0xa4/0xdc
[   93.256993] [<8eac9288>] ovpn_peer_release_kref+0x68/0x7c [ovpn_dco_v2]
[   93.263626] [<804aa770>] genl_rcv_msg+0x24c/0x4cc
[   93.268313] [<804a8c34>] netlink_rcv_skb+0xac/0x158
[   93.273172] [<804a94d4>] genl_rcv+0x30/0x48
[   93.277340] [<804a823c>] netlink_unicast+0x1c4/0x2e4
[   93.282285] [<804a86a8>] netlink_sendmsg+0x34c/0x4a4
[   93.287249] [<80416b28>] ____sys_sendmsg+0xc0/0x264
[   93.292109] [<80417438>] ___sys_sendmsg+0x7c/0xcc
[   93.296801] [<80418658>] __sys_sendmsg+0x4c/0x94
[   93.301411] [<80014598>] syscall_common+0x34/0x58
[   93.307392] CPU: 2 PID: 3337 Comm: openvpn Not tainted 5.4.238 #0
[   93.313483] Stack : 8df4bc74 800804d0 80700000 807067d4 80770000 8070679c 80705498 8df4bb6c
[   93.321815]         808b0000 8fe0c2f0 80750d63 806860e4 00000002 00000001 8df4bb10 964647d0
[   93.330136]         00000000 00000000 808f0000 00000000 30232038 0000013d 2e352064 33322e34
[   93.338457]         00000000 00000004 00000000 0004b0c0 80000000 80770000 00000000 00000000
[   93.346777]         8f16d900 00000000 807683e0 8eabcd80 00000000 80369ec8 003d17af 003d932f
[   93.355099]         ...
[   93.357535] Call Trace:
[   93.360006] [<8000b6cc>] show_stack+0x30/0x100
[   93.364468] [<805c7dcc>] dump_stack+0xa4/0xdc
[   93.368942] [<8eac9288>] ovpn_peer_release_kref+0x68/0x7c [ovpn_dco_v2]
[   93.375569] [<804aa770>] genl_rcv_msg+0x24c/0x4cc
[   93.380256] [<804a8c34>] netlink_rcv_skb+0xac/0x158
[   93.385116] [<804a94d4>] genl_rcv+0x30/0x48
[   93.389284] [<804a823c>] netlink_unicast+0x1c4/0x2e4
[   93.394230] [<804a86a8>] netlink_sendmsg+0x34c/0x4a4
[   93.399193] [<80416b28>] ____sys_sendmsg+0xc0/0x264
[   93.404052] [<80417438>] ___sys_sendmsg+0x7c/0xcc
[   93.408744] [<80418658>] __sys_sendmsg+0x4c/0x94
[   93.413351] [<80014598>] syscall_common+0x34/0x58
[   93.497190] CPU: 2 PID: 3337 Comm: openvpn Not tainted 5.4.238 #0
[   93.503281] Stack : 8df4bc74 800804d0 80700000 807067d4 80770000 8070679c 80705498 8df4bb6c
[   93.511609]         808b0000 8fe0c2f0 80750d63 806860e4 00000002 00000001 8df4bb10 964647d0
[   93.519932]         00000000 00000000 808f0000 00000000 30232038 00000151 2e352064 33322e34
[   93.528260]         00000000 00000004 00000000 00079626 80000000 80770000 00000000 00000000
[   93.536581]         8f16d480 00000000 807683e0 8eabcd80 00000000 80369ec8 00000008 808b0008
[   93.544905]         ...
[   93.547339] Call Trace:
[   93.549809] [<8000b6cc>] show_stack+0x30/0x100
[   93.554274] [<805c7dcc>] dump_stack+0xa4/0xdc
[   93.558746] [<8eac9288>] ovpn_peer_release_kref+0x68/0x7c [ovpn_dco_v2]
[   93.565379] [<804aa770>] genl_rcv_msg+0x24c/0x4cc
[   93.570066] [<804a8c34>] netlink_rcv_skb+0xac/0x158
[   93.574925] [<804a94d4>] genl_rcv+0x30/0x48
[   93.579093] [<804a823c>] netlink_unicast+0x1c4/0x2e4
[   93.584039] [<804a86a8>] netlink_sendmsg+0x34c/0x4a4
[   93.589002] [<80416b28>] ____sys_sendmsg+0xc0/0x264
[   93.593862] [<80417438>] ___sys_sendmsg+0x7c/0xcc
[   93.598554] [<80418658>] __sys_sendmsg+0x4c/0x94
[   93.603162] [<80014598>] syscall_common+0x34/0x58
[  111.836167] *** Quit OVPN Here ***
[  114.369607] CPU: 2 PID: 3337 Comm: openvpn Not tainted 5.4.238 #0
[  114.375703] Stack : 8e216700 800804d0 80700000 807067d4 80770000 8070679c 80705498 8df4ba84
[  114.384035]         808b0000 8fe0c2f0 80750d63 806860e4 00000002 00000001 8df4ba28 964647d0
[  114.392357]         00000000 00000000 808f0000 00000000 30232038 00000166 2e352064 33322e34
[  114.400681]         00000000 00000005 00000000 0005a3c7 80000000 80770000 00000000 8df31250
[  114.409006]         00000001 00000002 8df844b4 8df844b4 00000000 80369ec8 0803e098 808b0008
[  114.417327]         ...
[  114.419761] Call Trace:
[  114.422230] [<8000b6cc>] show_stack+0x30/0x100
[  114.426693] [<805c7dcc>] dump_stack+0xa4/0xdc
[  114.431172] [<8eac679c>] ovpn_peer_stats_init+0x183c/0x192c [ovpn_dco_v2]
[  114.438681] CPU: 2 PID: 3337 Comm: openvpn Not tainted 5.4.238 #0
[  114.444752] Stack : 8e216700 800804d0 80700000 807067d4 80770000 8070679c 80705498 8df4bbf4
[  114.453078]         808b0000 8fe0c2f0 80750d63 806860e4 00000002 00000001 8df4bb98 964647d0
[  114.461399]         00000000 00000000 808f0000 00000000 30232038 00000171 2e352064 33322e34
[  114.469720]         00000000 00000005 00000000 0006b199 80000000 80770000 00000000 8df31250
[  114.478040]         00000000 00000002 8df844b4 8df844b4 00000000 80369ec8 003d17af 003d932f
[  114.486361]         ...
[  114.488795] Call Trace:
[  114.491257] [<8000b6cc>] show_stack+0x30/0x100
[  114.495711] [<805c7dcc>] dump_stack+0xa4/0xdc
[  114.500151] [<8eac679c>] ovpn_peer_stats_init+0x183c/0x192c [ovpn_dco_v2]
[  114.508730] CPU: 2 PID: 3337 Comm: openvpn Not tainted 5.4.238 #0
[  114.514822] Stack : 8df4bc74 800804d0 80700000 807067d4 80770000 8070679c 80705498 8df4bb2c
[  114.523155]         808b0000 8fe0c2f0 80750d63 806860e4 00000002 00000001 8df4bad0 964647d0
[  114.531476]         00000000 00000000 808f0000 00000000 30232038 0000017c 2e352064 33322e34
[  114.539797]         00000000 00000005 00000000 0007c33a 80000000 80770000 00000000 00000000
[  114.548117]         8f16df00 00000000 807683e0 8eabcd80 00000000 80369ec8 003d17af 003d932f
[  114.556438]         ...
[  114.558874] Call Trace:
[  114.561346] [<8000b6cc>] show_stack+0x30/0x100
[  114.565809] [<805c7dcc>] dump_stack+0xa4/0xdc
[  114.570277] [<8eac9234>] ovpn_peer_release_kref+0x14/0x7c [ovpn_dco_v2]
[  114.576916] [<8eac61b8>] ovpn_peer_stats_init+0x1258/0x192c [ovpn_dco_v2]
[  114.583954] CPU: 2 PID: 3337 Comm: openvpn Not tainted 5.4.238 #0
[  114.590021] Stack : 8df4bc74 800804d0 80700000 807067d4 80770000 8070679c 80705498 8df4bb6c
[  114.598347]         808b0000 8fe0c2f0 80750d63 806860e4 00000002 00000001 8df4bb10 964647d0
[  114.606668]         00000000 00000000 808f0000 00000000 30232038 00000188 2e352064 33322e34
[  114.614990]         00000000 00000005 00000000 0008e912 80000000 80770000 00000000 00000000
[  114.623312]         8f16df00 00000000 807683e0 8eabcd80 00000000 80369ec8 003d172f 003d176f
[  114.631634]         ...
[  114.634068] Call Trace:
[  114.636517] [<8000b6cc>] show_stack+0x30/0x100
[  114.640960] [<805c7dcc>] dump_stack+0xa4/0xdc
[  114.645356] [<8eac9288>] ovpn_peer_release_kref+0x68/0x7c [ovpn_dco_v2]
[  114.651988] [<804aa770>] genl_rcv_msg+0x24c/0x4cc
[  114.656676] [<804a8c34>] netlink_rcv_skb+0xac/0x158
[  114.661535] [<804a94d4>] genl_rcv+0x30/0x48
[  114.665704] [<804a823c>] netlink_unicast+0x1c4/0x2e4
[  114.670651] [<804a86a8>] netlink_sendmsg+0x34c/0x4a4
[  114.675614] [<80416b28>] ____sys_sendmsg+0xc0/0x264
[  114.680474] [<80417438>] ___sys_sendmsg+0x7c/0xcc
[  114.685166] [<80418658>] __sys_sendmsg+0x4c/0x94
[  114.689776] [<80014598>] syscall_common+0x34/0x58
[  114.695137] ovpn_peer_delete_work() enter
[  114.699223] ovpn_tcp_socket_detach() enter
[  114.703574] ovpn_tcp_socket_detach() leave
[  114.707700] ovpn_peer_delete_work() mid
[  114.711556] ovs: deleting peer with id 0, reason 1
[  114.716396] ovpn_peer_delete_work() leave
[  114.732990] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 5.4.238 #0
[  114.738981] Stack : 807a42b8 800804d0 80700000 807067d4 80770000 8070679c 80705498 8fc11e84
[  114.747308]         808b0000 8fc3c8b0 80750d63 806860e4 00000002 00000001 8fc11e28 964647d0
[  114.755629]         00000000 00000000 808f0000 00000000 00000030 000001a2 342e3520 3833322e
[  114.763949]         00000000 00000005 00000000 000b2f3e 00000000 80770000 00000000 80756ac0
[  114.772269]         8fc11f20 00000000 00000000 0000000a 00000003 80369ec8 003d1a6f 003d93ef
[  114.780590]         ...
[  114.783025] Call Trace:
[  114.785496] [<8000b6cc>] show_stack+0x30/0x100
[  114.789960] [<805c7dcc>] dump_stack+0xa4/0xdc
[  114.794439] [<8eac27d8>] ovpn_send_data+0x43c/0x7a0 [ovpn_dco_v2]

failure

[  162.589850] CPU: 0 PID: 3337 Comm: openvpn Not tainted 5.4.238 #0
[  162.595949] Stack : 8df4bc74 800804d0 80700000 807067d4 80770000 8070679c 80705498 8df4bb6c
[  162.604279]         808b0000 8fe0c2f0 80750d63 806860e4 00000000 00000001 8df4bb10 964647d0
[  162.612599]         00000000 00000000 808f0000 00000000 30232038 00000230 2e352064 33322e34
[  162.620918]         00000000 00000007 00000000 0009001a 80000000 80770000 00000000 00000000
[  162.629237]         8ff79540 00000000 807683e0 8eabcd80 00000000 80369ec8 003d1a6f 003d93ef
[  162.637557]         ...
[  162.639990] Call Trace:
[  162.642460] [<8000b6cc>] show_stack+0x30/0x100
[  162.646920] [<805c7dcc>] dump_stack+0xa4/0xdc
[  162.651398] [<8eac9288>] ovpn_peer_release_kref+0x68/0x7c [ovpn_dco_v2]
[  162.658025] [<804aa770>] genl_rcv_msg+0x24c/0x4cc
[  162.662709] [<804a8c34>] netlink_rcv_skb+0xac/0x158
[  162.667564] [<804a94d4>] genl_rcv+0x30/0x48
[  162.671730] [<804a823c>] netlink_unicast+0x1c4/0x2e4
[  162.676672] [<804a86a8>] netlink_sendmsg+0x34c/0x4a4
[  162.681633] [<80416b28>] ____sys_sendmsg+0xc0/0x264
[  162.686489] [<80417438>] ___sys_sendmsg+0x7c/0xcc
[  162.691177] [<80418658>] __sys_sendmsg+0x4c/0x94
[  162.695782] [<80014598>] syscall_common+0x34/0x58
[  162.701889] CPU: 0 PID: 3337 Comm: openvpn Not tainted 5.4.238 #0
[  162.707979] Stack : 8df4bc74 800804d0 80700000 807067d4 80770000 8070679c 80705498 8df4bb6c
[  162.716314]         808b0000 8fe0c2f0 80750d63 806860e4 00000000 00000001 8df4bb10 964647d0
[  162.724634]         00000000 00000000 808f0000 00000000 30232038 00000244 2e352064 33322e34
[  162.732953]         00000000 00000007 00000000 000ab5c1 80000000 80770000 00000000 00000000
[  162.741272]         8ff79480 00000000 807683e0 8eabcd80 00000000 80369ec8 003d11af 003d91af
[  162.749591]         ...
[  162.752026] Call Trace:
[  162.754492] [<8000b6cc>] show_stack+0x30/0x100
[  162.758952] [<805c7dcc>] dump_stack+0xa4/0xdc
[  162.763422] [<8eac9288>] ovpn_peer_release_kref+0x68/0x7c [ovpn_dco_v2]
[  162.770047] [<804aa770>] genl_rcv_msg+0x24c/0x4cc
[  162.774731] [<804a8c34>] netlink_rcv_skb+0xac/0x158
[  162.779587] [<804a94d4>] genl_rcv+0x30/0x48
[  162.783753] [<804a823c>] netlink_unicast+0x1c4/0x2e4
[  162.788696] [<804a86a8>] netlink_sendmsg+0x34c/0x4a4
[  162.793656] [<80416b28>] ____sys_sendmsg+0xc0/0x264
[  162.798512] [<80417438>] ___sys_sendmsg+0x7c/0xcc
[  162.803200] [<80418658>] __sys_sendmsg+0x4c/0x94
[  162.807805] [<80014598>] syscall_common+0x34/0x58
[  162.815836] CPU: 0 PID: 3337 Comm: openvpn Not tainted 5.4.238 #0
[  162.821929] Stack : 8df4bc74 800804d0 80700000 807067d4 80770000 8070679c 80705498 8df4bb6c
[  162.830260]         808b0000 8fe0c2f0 80750d63 806860e4 00000000 00000001 8df4bb10 964647d0
[  162.838580]         00000000 00000000 808f0000 00000000 30232038 00000258 2e352064 33322e34
[  162.846900]         00000000 00000007 00000000 000c72dc 80000000 80770000 00000000 00000000
[  162.855219]         8ff79900 00000000 807683e0 8eabcd80 00000000 80369ec8 003d11af 003d91af
[  162.863539]         ...
[  162.865972] Call Trace:
[  162.868439] [<8000b6cc>] show_stack+0x30/0x100
[  162.872899] [<805c7dcc>] dump_stack+0xa4/0xdc
[  162.877372] [<8eac9288>] ovpn_peer_release_kref+0x68/0x7c [ovpn_dco_v2]
[  162.883997] [<804aa770>] genl_rcv_msg+0x24c/0x4cc
[  162.888680] [<804a8c34>] netlink_rcv_skb+0xac/0x158
[  162.893537] [<804a94d4>] genl_rcv+0x30/0x48
[  162.897703] [<804a823c>] netlink_unicast+0x1c4/0x2e4
[  162.902645] [<804a86a8>] netlink_sendmsg+0x34c/0x4a4
[  162.907606] [<80416b28>] ____sys_sendmsg+0xc0/0x264
[  162.912462] [<80417438>] ___sys_sendmsg+0x7c/0xcc
[  162.917151] [<80418658>] __sys_sendmsg+0x4c/0x94
[  162.921757] [<80014598>] syscall_common+0x34/0x58
[  164.893558] *** Quit OVPN Here ***
[  177.440740] CPU: 0 PID: 3337 Comm: openvpn Not tainted 5.4.238 #0
[  177.446838] Stack : 8de35300 800804d0 80700000 807067d4 80770000 8070679c 80705498 8df4ba84
[  177.455186]         808b0000 8fe0c2f0 80750d63 806860e4 00000000 00000001 8df4ba28 964647d0
[  177.463539]         00000000 00000000 808f0000 00000000 30232038 0000026d 2e352064 33322e34
[  177.471893]         00000000 00000008 00000000 0006b9a4 80000000 80770000 00000000 8df31250
[  177.480239]         00000001 00000002 8df844b4 8df844b4 00000000 80369ec8 08042098 808b0000
[  177.488586]         ...
[  177.491037] Call Trace:
[  177.493533] [<8000b6cc>] show_stack+0x30/0x100
[  177.498030] [<805c7dcc>] dump_stack+0xa4/0xdc
[  177.502573] [<8eac679c>] ovpn_peer_stats_init+0x183c/0x192c [ovpn_dco_v2]
[  177.512047] CPU: 3 PID: 3337 Comm: openvpn Not tainted 5.4.238 #0
[  177.518143] Stack : 8e962800 800804d0 80700000 807067d4 80770000 8070679c 80705498 8df4bbf4
[  177.526503]         808b0000 8fe0c2f0 80750d63 806860e4 00000003 00000001 8df4bb98 964647d0
[  177.534868]         00000000 00000000 808f0000 00000000 30232038 00000278 2e352064 33322e34
[  177.543234]         00000000 00000008 00000000 0007d02f 80000000 80770000 00000000 8df31250
[  177.551613]         00000000 00000002 8df844b4 8df844b4 00000000 80369ec8 0803e098 808b000c
[  177.559996]         ...
[  177.562463] Call Trace:
[  177.564962] [<8000b6cc>] show_stack+0x30/0x100
[  177.569489] [<805c7dcc>] dump_stack+0xa4/0xdc
[  177.574039] [<8eac679c>] ovpn_peer_stats_init+0x183c/0x192c [ovpn_dco_v2]
[  177.585732] CPU: 3 PID: 3337 Comm: openvpn Not tainted 5.4.238 #0
[  177.591837] Stack : 8df4bc74 800804d0 80700000 807067d4 80770000 8070679c 80705498 8df4bb6c
[  177.600208]         808b0000 8fe0c2f0 80750d63 806860e4 00000003 00000001 8df4bb10 964647d0
[  177.608572]         00000000 00000000 808f0000 00000000 30232038 00000283 2e352064 33322e34
[  177.616927]         00000000 00000008 00000000 0008f004 80000000 80770000 00000000 00000000
[  177.625277]         8feaf240 00000000 807683e0 8eabcd80 00000000 80369ec8 003d1a6f 003d93ef
[  177.633628]         ...
[  177.636085] Call Trace:
[  177.638573] [<8000b6cc>] show_stack+0x30/0x100
[  177.643058] [<805c7dcc>] dump_stack+0xa4/0xdc
[  177.647586] [<8eac9288>] ovpn_peer_release_kref+0x68/0x7c [ovpn_dco_v2]
[  177.654244] [<804aa770>] genl_rcv_msg+0x24c/0x4cc
[  177.658932] [<804a8c34>] netlink_rcv_skb+0xac/0x158
[  177.663794] [<804a94d4>] genl_rcv+0x30/0x48
[  177.667966] [<804a823c>] netlink_unicast+0x1c4/0x2e4
[  177.672911] [<804a86a8>] netlink_sendmsg+0x34c/0x4a4
[  177.677880] [<80416b28>] ____sys_sendmsg+0xc0/0x264
[  177.682741] [<80417438>] ___sys_sendmsg+0x7c/0xcc
[  177.687436] [<80418658>] __sys_sendmsg+0x4c/0x94
[  177.692051] [<80014598>] syscall_common+0x34/0x58
[  187.815984] unregister_netdevice: waiting for ovs to become free. Usage count = 1
[  197.896299] unregister_netdevice: waiting for ovs to become free. Usage count = 1
[  207.976583] unregister_netdevice: waiting for ovs to become free. Usage count = 1
[  218.057089] unregister_netdevice: waiting for ovs to become free. Usage count = 1

cron2 · 2023-10-17T06:28:13Z

Oh, this is exciting news. Especially having instructions how to reproduce this is really really good.

(My testing so far focused on heavy client connect/disconnect churn, which did trigger a few issues :-) - but yours seems to be a race condition between "(heavy) traffic inside the VPN" and "peer being torn down", which I did not specifically test yet)

claw6148 · 2023-10-17T08:01:39Z

Oh, this is exciting news. Especially having instructions how to reproduce this is really really good.

(My testing so far focused on heavy client connect/disconnect churn, which did trigger a few issues :-) - but yours seems to be a race condition between "(heavy) traffic inside the VPN" and "peer being torn down", which I did not specifically test yet)

Yes, just one client is enough to reproduce.
This is less likely to happen if I perform an iperf3 download test (from server to client).

ordex · 2023-10-17T08:54:40Z

Super exciting! will give it a try today!

claw6148 · 2023-10-17T10:15:27Z

Is it related to peer->refcount?
I added some outputs on ovpn_peer_hold() and ovpn_peer_put() as shown below

diff --git a/drivers/net/ovpn-dco/peer.h b/drivers/net/ovpn-dco/peer.h
index c4f72f6..a439e14 100644
--- a/drivers/net/ovpn-dco/peer.h
+++ b/drivers/net/ovpn-dco/peer.h
@@ -122,12 +122,16 @@ void ovpn_peer_release(struct ovpn_peer *peer);
 
 static inline bool ovpn_peer_hold(struct ovpn_peer *peer)
 {
+	printk("ovpn_peer_hold peer->refcount = %d\n", peer->refcount.refcount.refs.counter);
+	dump_stack();
 	return kref_get_unless_zero(&peer->refcount);
 }
 
 static inline void ovpn_peer_put(struct ovpn_peer *peer)
 {
 	kref_put(&peer->refcount, ovpn_peer_release_kref);
+	printk("ovpn_peer_put peer->refcount = %d\n", peer->refcount.refcount.refs.counter);
+	dump_stack();
 }
 
 static inline void ovpn_peer_keepalive_recv_reset(struct ovpn_peer *peer)

The refcount.log is the dmesg log.
At lines 5713 and 5725, ovpn_peer_hold() shows peer->refcount as 0 twice, but there is no call to ovpn_peer_put() between the two ovpn_peer_hold() calls.
Line 5739, refcount_t: underflow; use-after-free.

ordex · 2023-10-17T10:17:05Z

most likely yes. there must be some non-trivial imbalance that is triggered by a race condition..

ordex · 2023-10-17T10:18:17Z

At lines 5713 and 5725, ovpn_peer_hold() shows peer->refcount as 0 twice, but there is no call to ovpn_peer_put() between the two ovpn_peer_hold() calls.

This is not an issue per se. when _hold() is called and refcount is 0, the reference is simply not obtained because the peer is considered gone. This is why there is no put. refcount in this case stays at zero because the object has reached end of life

ordex · 2023-10-17T10:28:55Z

Mhh.. we have an ovpn_peer_hold() in ovpn_tcp_read_sock() where we don't check the return value.
This means that if the refcount was already 0 at this point, it will stay so (due to the explanation above), but the following put() will trigger an underflow.

Which is what is happening later in your log at line 5739 and indeed confirmed by line 5758:

5758 [ 107.155298] ovpn_peer_put peer->refcount = -1073741824

Now the thing is: we have that _hold() without return value check because we assume that at that point in time we must be already holding a ref to that peer. We should not be there if we don't hold a reference.
So this points to some imbalance in the way the TCP sock object holds a reference to the related peer....

claw6148 · 2023-10-17T10:50:33Z

The platform I'm using above is a router with an MT7621A CPU, I just tried testing the same way on a x64 VM but haven't reproduced the issue yet... :-(

ordex · 2023-10-17T11:21:30Z

still the hints you provided kinda point us towards some more circumvented direction. If I am unable to reproduce as well, I may setup an old MT7621 based router and give it a shot.

ordex · 2023-10-18T11:43:40Z

Interesting enough, on my virtual setup I can reproduce the "Waiting for tunX to become free" issue by following your steps. But I don't see the underflow.
They may just be two symptoms of the same bug. It may just depend on timing.

claw6148 · 2023-10-18T12:32:00Z

I tried moving ovpn_peer_hold() from ovpn_tcp_read_sock() to ovpn_decrypt_work() and it seemed to be OK.

diff --git a/drivers/net/ovpn-dco/ovpn.c b/drivers/net/ovpn-dco/ovpn.c
index 772a34b..4a482f7 100644
--- a/drivers/net/ovpn-dco/ovpn.c
+++ b/drivers/net/ovpn-dco/ovpn.c
@@ -163,8 +163,7 @@ int ovpn_recv(struct ovpn_struct *ovpn, struct ovpn_peer *peer, struct sk_buff *
 	if (unlikely(ptr_ring_produce_bh(&peer->rx_ring, skb) < 0))
 		return -ENOSPC;
 
-	if (!queue_work(ovpn->crypto_wq, &peer->decrypt_work))
-		ovpn_peer_put(peer);
+	queue_work(ovpn->crypto_wq, &peer->decrypt_work);
 
 	return 0;
 }
@@ -263,6 +262,10 @@ void ovpn_decrypt_work(struct work_struct *work)
 	struct sk_buff *skb;
 
 	peer = container_of(work, struct ovpn_peer, decrypt_work);
+
+	if (!ovpn_peer_hold(peer))
+		return;
+
 	while ((skb = ptr_ring_consume_bh(&peer->rx_ring))) {
 		if (likely(ovpn_decrypt_one(peer, skb) == 0)) {
 			/* if a packet has been enqueued for NAPI, signal
diff --git a/drivers/net/ovpn-dco/tcp.c b/drivers/net/ovpn-dco/tcp.c
index 288a691..c794d1f 100644
--- a/drivers/net/ovpn-dco/tcp.c
+++ b/drivers/net/ovpn-dco/tcp.c
@@ -103,8 +103,7 @@ static int ovpn_tcp_read_sock(read_descriptor_t *desc, struct sk_buff *in_skb,
 			 * Queue skb for sending to userspace via recvmsg on the socket
 			 */
 			if (likely(ovpn_opcode_from_skb(peer->tcp.skb, 0) == OVPN_DATA_V2)) {
-				/* hold reference to peer as requird by ovpn_recv() */
-				ovpn_peer_hold(peer);
+				/* peer reference will be held in ovpn_decrypt_work() */
 				status = ovpn_recv(peer->ovpn, peer, peer->tcp.skb);
 			} else {
 				/* prepend skb with packet len. this way userspace can parse
diff --git a/drivers/net/ovpn-dco/udp.c b/drivers/net/ovpn-dco/udp.c
index db2b9e5..941dcf7 100644
--- a/drivers/net/ovpn-dco/udp.c
+++ b/drivers/net/ovpn-dco/udp.c
@@ -117,6 +117,8 @@ static int ovpn_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 		goto drop;
 	}
 
+	ovpn_peer_put(peer);
+
 	/* should this be a non DATA_V2 packet, ret will be >0 and this will instruct the UDP
 	 * stack to continue processing this packet as usual (i.e. deliver to user space)
 	 */

Apply the following patch and set a shorter keepalive in the config to see how the peer->refcount changes.

diff --git a/drivers/net/ovpn-dco/peer.c b/drivers/net/ovpn-dco/peer.c
index f9e72fb..bde3d83 100644
--- a/drivers/net/ovpn-dco/peer.c
+++ b/drivers/net/ovpn-dco/peer.c
@@ -24,7 +24,7 @@ static void ovpn_peer_ping(struct timer_list *t)
 {
 	struct ovpn_peer *peer = from_timer(peer, t, keepalive_xmit);
 
-	netdev_dbg(peer->ovpn->dev, "%s: sending ping to peer %u\n", __func__, peer->id);
+	netdev_dbg(peer->ovpn->dev, "%s: sending ping to peer %u refcount %d\n", __func__, peer->id, peer->refcount.refcount.refs.counter);
 	ovpn_keepalive_xmit(peer);
 }

ordex · 2023-10-18T13:18:48Z

This is racy: what guarantee do we have that between scheduling ovpn_decrypt_work() and executing it the peer isn't released and thus leading us to having a stale peer/timer pointer?

ordex · 2023-10-18T13:31:57Z

the statement above is exactly the reason why the reference is obtained before scheduling the worker: we want to be sure that the peer stays alive all time long.
Although the problem is clearly around those lines, this solution is not acceptable.

claw6148 · 2023-10-18T13:46:00Z

This is indeed a problem.

If only modify ovpn_peer_ping() to print the refcount, it will be very large (>1000 or even more) when the problem occurs.
When the iperf3 test is stopped, the refcount does not decrease and remains at a high level.

ordex · 2023-10-18T13:55:27Z

thanks for all these hints. they are very helpful!

ordex · 2023-11-02T08:30:01Z

the master branch contains what we believe to be a fix for this issue. @claw6148 would you be able to give it a try?

claw6148 · 2023-11-02T09:21:41Z

wow, tested several times and no hangings yet

still prints refcount_t: underflow; use-after-free. but rarely happens and not affecting ovpn to be exited (#29)

ordex · 2023-11-02T10:41:36Z

I think that's the problem reported in #29 ? would you mind appending your log there? Thanks!

ordex · 2023-11-02T12:06:09Z

So I'd say this issue can be closed. Thank you all for the feedback provided so far!
Feel free to reopen it should you be able to trigger this issue again.

ordex mentioned this issue Jan 5, 2023

Hangs on unloading #17

Open

ordex added the bug Something isn't working label Jan 15, 2023

ordex self-assigned this Mar 31, 2023

ordex mentioned this issue May 5, 2023

Openvpn 2.6.3 cras "nregister_netdevice: waiting for tun_0 to become free. Usage count = 1" #34

Closed

ordex mentioned this issue Sep 2, 2023

DCO interface error after restarting the openvpn service or restarting the server. #45

Closed

ordex mentioned this issue Sep 27, 2023

netlink reports object not found #50

Open

ordex closed this as completed Nov 2, 2023

claw6148 mentioned this issue Nov 2, 2023

2.6.2: Kernel warning refcount_warn_saturate #29

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Waiting for tunxxx to become free #18

Waiting for tunxxx to become free #18

bernhardschmidt commented Jan 5, 2023

ordex commented Jan 5, 2023

bernhardschmidt commented Jan 9, 2023

bernhardschmidt commented Jan 9, 2023

ordex commented Mar 24, 2023

bernhardschmidt commented Mar 31, 2023

ordex commented Mar 31, 2023

bernhardschmidt commented Mar 31, 2023

hildeb commented Apr 25, 2023

mr-liusg commented May 13, 2023 •

edited

Loading

ordex commented May 13, 2023

mr-liusg commented May 15, 2023

claw6148 commented Oct 17, 2023 •

edited

Loading

cron2 commented Oct 17, 2023

claw6148 commented Oct 17, 2023

ordex commented Oct 17, 2023

claw6148 commented Oct 17, 2023

ordex commented Oct 17, 2023

ordex commented Oct 17, 2023 •

edited

Loading

ordex commented Oct 17, 2023 •

edited

Loading

claw6148 commented Oct 17, 2023 •

edited

Loading

ordex commented Oct 17, 2023

ordex commented Oct 18, 2023

claw6148 commented Oct 18, 2023 •

edited

Loading

ordex commented Oct 18, 2023

ordex commented Oct 18, 2023

claw6148 commented Oct 18, 2023

ordex commented Oct 18, 2023

ordex commented Nov 2, 2023

claw6148 commented Nov 2, 2023 •

edited

Loading

ordex commented Nov 2, 2023

ordex commented Nov 2, 2023

Waiting for tunxxx to become free #18

Waiting for tunxxx to become free #18

Comments

bernhardschmidt commented Jan 5, 2023

ordex commented Jan 5, 2023

bernhardschmidt commented Jan 9, 2023

bernhardschmidt commented Jan 9, 2023

ordex commented Mar 24, 2023

bernhardschmidt commented Mar 31, 2023

ordex commented Mar 31, 2023

bernhardschmidt commented Mar 31, 2023

hildeb commented Apr 25, 2023

mr-liusg commented May 13, 2023 • edited Loading

ordex commented May 13, 2023

mr-liusg commented May 15, 2023

claw6148 commented Oct 17, 2023 • edited Loading

dmesg logs

cron2 commented Oct 17, 2023

claw6148 commented Oct 17, 2023

ordex commented Oct 17, 2023

claw6148 commented Oct 17, 2023

ordex commented Oct 17, 2023

ordex commented Oct 17, 2023 • edited Loading

ordex commented Oct 17, 2023 • edited Loading

claw6148 commented Oct 17, 2023 • edited Loading

ordex commented Oct 17, 2023

ordex commented Oct 18, 2023

claw6148 commented Oct 18, 2023 • edited Loading

ordex commented Oct 18, 2023

ordex commented Oct 18, 2023

claw6148 commented Oct 18, 2023

ordex commented Oct 18, 2023

ordex commented Nov 2, 2023

claw6148 commented Nov 2, 2023 • edited Loading

ordex commented Nov 2, 2023

ordex commented Nov 2, 2023

mr-liusg commented May 13, 2023 •

edited

Loading

claw6148 commented Oct 17, 2023 •

edited

Loading

ordex commented Oct 17, 2023 •

edited

Loading

ordex commented Oct 17, 2023 •

edited

Loading

claw6148 commented Oct 17, 2023 •

edited

Loading

claw6148 commented Oct 18, 2023 •

edited

Loading

claw6148 commented Nov 2, 2023 •

edited

Loading