DataWriter appears to receive notification about new loopback DataReader before it is ready to receive data #2130

matthew-ivester · 2024-11-07T19:41:36Z

While trying to clean up and improve some unit tests in code using Cyclone, I came across what looks like buggy (or at least undesirable) behavior around matching of local (ie, loopback) DataReaders and DataWriters.

The sequence of events is:

A DataWriter exists for a topic, using a reliable QoS setting (but not TransientLocal)
- we create a thread that blocks on a condition variable waiting for publication_matched_status().current_count() to reach 2 or more
- In the on_publication_matched() callback on the DataWriterListener we notify the CV to wake up any waiting threads
Two DataReaders are created for the topic
When the first DataReader is created, on_publication_matched() is triggered on the DataWriterListener
- The waiting thread wakes up, calls publication_matched_status.current_count(), gets 1, goes back to sleep
When the second DataReader is created, on_publication_matched() is triggered on the DataWriterListener
- The waiting thread wakes up, calls publication_matched_status.current_count(), gets 2
The waiting thread continues and calls write() on the DataWriter
Only the first DataReader actually receives the sample

This seems like a race condition, as usually it works fine and both readers get the sample. Occasionally a unit test will fail because the last DataReader that was created doesn't receive anything. I would expect that on_publication_matched() would only be triggered once a write call will successfully deliver to the new DataReader. I've only seen this happen when we're creating two or more readers, so there may be some difference in behavior with how the first reader+writer are matched compared to subsequent ones.

For the moment we've worked around this by putting the waiting thread back to sleep for a short time (~5ms) after it sees the matched subscriber count tick up to the threshold value. That seems to fix it. But it would be better if this worked reliably.

I can reproduce this with our product's stack, but I don't currently have a straightforward test that can do so on top of cyclonedds alone. I can try to create one if needed, or see if we can get it to occur with more detailed logging enabled in Cyclone.

The text was updated successfully, but these errors were encountered:

eboasson · 2024-11-12T16:22:34Z

That's intriguing (a.k.a. it is not immediately obvious to me why that race condition would exist), and I certainly consider it wrong not to deliver data to readers for which you have already received a "publication matched" notification.

The scenario is very simple, so perhaps I should simply try it myself first. I have plenty of test code I can copy-and-enhance 🙂

eboasson · 2024-11-19T10:54:16Z

No luck yet in reproducing this ...

You can find my attempt at eboasson@dee288c, perhaps you can see if it seems like it should exhibit the same behaviour?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DataWriter appears to receive notification about new loopback DataReader before it is ready to receive data #2130

DataWriter appears to receive notification about new loopback DataReader before it is ready to receive data #2130

matthew-ivester commented Nov 7, 2024 •

edited

Loading

eboasson commented Nov 12, 2024

eboasson commented Nov 19, 2024

DataWriter appears to receive notification about new loopback DataReader before it is ready to receive data #2130

DataWriter appears to receive notification about new loopback DataReader before it is ready to receive data #2130

Comments

matthew-ivester commented Nov 7, 2024 • edited Loading

eboasson commented Nov 12, 2024

eboasson commented Nov 19, 2024

matthew-ivester commented Nov 7, 2024 •

edited

Loading