Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dds_delete of participant hangs if no further message is received. #1916

Closed
huntjw opened this issue Jan 8, 2024 · 4 comments · Fixed by #1920
Closed

dds_delete of participant hangs if no further message is received. #1916

huntjw opened this issue Jan 8, 2024 · 4 comments · Fixed by #1920

Comments

@huntjw
Copy link

huntjw commented Jan 8, 2024

I have observed that dds_delete of participant will hang if no further messages are received. ddsi_stop->wait_for_receive_threads->ddsi_join_thread is blocked waiting on receive thread to exit. recvmsg in listening thread is blocked waiting on a message.

Commit: 4572324

Stack trace of dds_delete:
#0 0x00007ffff6af9d2d in __pthread_timedjoin_ex ()
from ext/sdk/ais-sdk-ocs-a/amd64/host/usr/x86_64-linux-gnu/sysroot/lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007ffff1f3a683 in ddsrt_thread_join (thread=..., thread_result=0x0) at /mnt/hdd/git/oss/cyclonedds/src/ddsrt/src/threads/posix/threads.c:504
#2 0x00007ffff1ec380d in ddsi_join_thread (thrst=0x6eb900) at /mnt/hdd/git/oss/cyclonedds/src/core/ddsi/src/ddsi_thread.c:419
#3 0x00007ffff1ea50c3 in wait_for_receive_threads (gv=0x6eca88) at /mnt/hdd/git/oss/cyclonedds/src/core/ddsi/src/ddsi_init.c:780
#4 0x00007ffff1ea9731 in ddsi_stop (gv=0x6eca88) at /mnt/hdd/git/oss/cyclonedds/src/core/ddsi/src/ddsi_init.c:1797
#5 0x00007ffff1f00730 in dds_domain_free (vdomain=0x6ec7d0) at /mnt/hdd/git/oss/cyclonedds/src/core/ddsc/src/dds_domain.c:331
#6 0x00007ffff1f05b6d in dds_entity_deriver_delete (e=0x6ec7d0) at /mnt/hdd/git/oss/cyclonedds/src/core/ddsc/src/dds__types.h:226
#7 0x00007ffff1f06b43 in really_delete_pinned_closed_locked (e=0x6ec7d0, delstate=DIS_IMPLICIT)
at /mnt/hdd/git/oss/cyclonedds/src/core/ddsc/src/dds_entity.c:547
#8 0x00007ffff1f06845 in dds_delete_impl_pinned (e=0x6ec7d0, delstate=DIS_IMPLICIT)
at /mnt/hdd/git/oss/cyclonedds/src/core/ddsc/src/dds_entity.c:441
#9 0x00007ffff1f06bc8 in really_delete_pinned_closed_locked (e=0x700bd0, delstate=DIS_USER)
at /mnt/hdd/git/oss/cyclonedds/src/core/ddsc/src/dds_entity.c:571
#10 0x00007ffff1f06845 in dds_delete_impl_pinned (e=0x700bd0, delstate=DIS_USER) at /mnt/hdd/git/oss/cyclonedds/src/core/ddsc/src/dds_entity.c:441
#11 0x00007ffff1f067be in dds_delete_impl (entity=2144110157, delstate=DIS_USER) at /mnt/hdd/git/oss/cyclonedds/src/core/ddsc/src/dds_entity.c:416
#12 0x00007ffff1f066fe in dds_delete (entity=2144110157) at /mnt/hdd/git/oss/cyclonedds/src/core/ddsc/src/dds_entity.c:399

Stack trace of recvMC thread:
#0 0x00007ffff6b03657 in recvmsg () from ext/sdk/ais-sdk-ocs-a/amd64/host/usr/x86_64-linux-gnu/sysroot/lib/x86_64-linux-gnu/libpthread.so.0
(gdb) bt
#0 0x00007ffff6b03657 in recvmsg () from ext/sdk/ais-sdk-ocs-a/amd64/host/usr/x86_64-linux-gnu/sysroot/lib/x86_64-linux-gnu/libpthread.so.0
#1 0x00007ffff1f37f7b in ddsrt_recvmsg (sock=21, msg=0x7fffde7fb2b0, flags=0, rcvd=0x7fffde7fb288)
at /mnt/hdd/git/oss/cyclonedds/src/ddsrt/src/sockets/posix/socket.c:534
#2 0x00007ffff1e39126 in ddsi_udp_conn_read (conn_cmn=0x6f1090, buf=0x7fffec450070 "", len=65536, allow_spurious=true, srcloc=0x7fffde7fb530)
at /mnt/hdd/git/oss/cyclonedds/src/core/ddsi/src/ddsi_udp.c:90
#3 0x00007ffff1e37ddc in ddsi_conn_read (conn=0x6f1090, buf=0x7fffec450070 "", len=65536, allow_spurious=true, srcloc=0x7fffde7fb530)
at /mnt/hdd/git/oss/cyclonedds/src/core/ddsi/src/ddsi__tran.h:389
#4 0x00007ffff1ec06f4 in do_packet (thrst=0x6eb900, gv=0x6eca88, conn=0x6f1090, guidprefix=0x0, rbpool=0x6f88e0)
at /mnt/hdd/git/oss/cyclonedds/src/core/ddsi/src/ddsi_receive.c:3315
#5 0x00007ffff1ec128c in ddsi_recv_thread (vrecv_thread_arg=0x6ed2a0) at /mnt/hdd/git/oss/cyclonedds/src/core/ddsi/src/ddsi_receive.c:3527
#6 0x00007ffff1ec3226 in create_thread_wrapper (ptr=0x6eb900) at /mnt/hdd/git/oss/cyclonedds/src/core/ddsi/src/ddsi_thread.c:260
#7 0x00007ffff1f39e96 in os_startRoutineWrapper (threadContext=0x6f8940) at /mnt/hdd/git/oss/cyclonedds/src/ddsrt/src/threads/posix/threads.c:190
#8 0x00007ffff6af86db in start_thread () from ext/sdk/ais-sdk-ocs-a/amd64/host/usr/x86_64-linux-gnu/sysroot/lib/x86_64-linux-gnu/libpthread.so.0
#9 0x00007ffff681771f in clone () from ext/sdk/ais-sdk-ocs-a/amd64/host/usr/x86_64-linux-gnu/sysroot/lib/x86_64-linux-gnu/libc.so.6

@eboasson
Copy link
Contributor

It sends a packet to itself (or, for recvMC to the multicast address) to unblock the thread on termination, repeating this once a second until the threads have stopped to allow for packet loss. Any chance that a firewall is blocking the packets?

You can configure it to multiplex all sockets on a single thread, then it won't need to receive a packet, it'll simply interrupt the select call by sending a byte on a pipe created for this purpose. For this, set Internal/MultipleReceiveThreads to false in the configuration.

(That is to say:

CYCLONEDDS_URI="<Internal><MultipleReceiveThreads>false</></>" ./yourprocess

should suffice for trying it.)

(The issue has cropped up several times, maybe the default is just wrong, and maybe I should look at pthread cancellation and DPCs on Windows to interrupt it instead ...)

@huntjw
Copy link
Author

huntjw commented Jan 12, 2024

Thank you for the explanation.

I suggest to use the select call and pipe for all cases. It is not a good property for dds_delete of participant to fail because of the network configuration.

@eboasson
Copy link
Contributor

@huntjw if you don't mind verifying that my PR #1920 fixes it, that'd be great. Otherwise, I'll go by own understanding of the code and the problem,

@huntjw
Copy link
Author

huntjw commented Jan 16, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants