Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DataWriter appears to receive notification about new loopback DataReader before it is ready to receive data #2130

Open
matthew-ivester opened this issue Nov 7, 2024 · 2 comments

Comments

@matthew-ivester
Copy link

matthew-ivester commented Nov 7, 2024

While trying to clean up and improve some unit tests in code using Cyclone, I came across what looks like buggy (or at least undesirable) behavior around matching of local (ie, loopback) DataReaders and DataWriters.

The sequence of events is:

  • A DataWriter exists for a topic, using a reliable QoS setting (but not TransientLocal)
    • we create a thread that blocks on a condition variable waiting for publication_matched_status().current_count() to reach 2 or more
    • In the on_publication_matched() callback on the DataWriterListener we notify the CV to wake up any waiting threads
  • Two DataReaders are created for the topic
  • When the first DataReader is created, on_publication_matched() is triggered on the DataWriterListener
    • The waiting thread wakes up, calls publication_matched_status.current_count(), gets 1, goes back to sleep
  • When the second DataReader is created, on_publication_matched() is triggered on the DataWriterListener
    • The waiting thread wakes up, calls publication_matched_status.current_count(), gets 2
  • The waiting thread continues and calls write() on the DataWriter
  • Only the first DataReader actually receives the sample

This seems like a race condition, as usually it works fine and both readers get the sample. Occasionally a unit test will fail because the last DataReader that was created doesn't receive anything. I would expect that on_publication_matched() would only be triggered once a write call will successfully deliver to the new DataReader. I've only seen this happen when we're creating two or more readers, so there may be some difference in behavior with how the first reader+writer are matched compared to subsequent ones.

For the moment we've worked around this by putting the waiting thread back to sleep for a short time (~5ms) after it sees the matched subscriber count tick up to the threshold value. That seems to fix it. But it would be better if this worked reliably.

I can reproduce this with our product's stack, but I don't currently have a straightforward test that can do so on top of cyclonedds alone. I can try to create one if needed, or see if we can get it to occur with more detailed logging enabled in Cyclone.

@eboasson
Copy link
Contributor

That's intriguing (a.k.a. it is not immediately obvious to me why that race condition would exist), and I certainly consider it wrong not to deliver data to readers for which you have already received a "publication matched" notification.

The scenario is very simple, so perhaps I should simply try it myself first. I have plenty of test code I can copy-and-enhance 🙂

@eboasson
Copy link
Contributor

No luck yet in reproducing this ...

You can find my attempt at eboasson@dee288c, perhaps you can see if it seems like it should exhibit the same behaviour?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants