You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
While trying to clean up and improve some unit tests in code using Cyclone, I came across what looks like buggy (or at least undesirable) behavior around matching of local (ie, loopback) DataReaders and DataWriters.
The sequence of events is:
A DataWriter exists for a topic, using a reliable QoS setting (but not TransientLocal)
we create a thread that blocks on a condition variable waiting for publication_matched_status().current_count() to reach 2 or more
In the on_publication_matched() callback on the DataWriterListener we notify the CV to wake up any waiting threads
Two DataReaders are created for the topic
When the first DataReader is created, on_publication_matched() is triggered on the DataWriterListener
The waiting thread wakes up, calls publication_matched_status.current_count(), gets 1, goes back to sleep
When the second DataReader is created, on_publication_matched() is triggered on the DataWriterListener
The waiting thread wakes up, calls publication_matched_status.current_count(), gets 2
The waiting thread continues and calls write() on the DataWriter
Only the first DataReader actually receives the sample
This seems like a race condition, as usually it works fine and both readers get the sample. Occasionally a unit test will fail because the last DataReader that was created doesn't receive anything. I would expect that on_publication_matched() would only be triggered once a write call will successfully deliver to the new DataReader. I've only seen this happen when we're creating two or more readers, so there may be some difference in behavior with how the first reader+writer are matched compared to subsequent ones.
For the moment we've worked around this by putting the waiting thread back to sleep for a short time (~5ms) after it sees the matched subscriber count tick up to the threshold value. That seems to fix it. But it would be better if this worked reliably.
I can reproduce this with our product's stack, but I don't currently have a straightforward test that can do so on top of cyclonedds alone. I can try to create one if needed, or see if we can get it to occur with more detailed logging enabled in Cyclone.
The text was updated successfully, but these errors were encountered:
That's intriguing (a.k.a. it is not immediately obvious to me why that race condition would exist), and I certainly consider it wrong not to deliver data to readers for which you have already received a "publication matched" notification.
The scenario is very simple, so perhaps I should simply try it myself first. I have plenty of test code I can copy-and-enhance 🙂
While trying to clean up and improve some unit tests in code using Cyclone, I came across what looks like buggy (or at least undesirable) behavior around matching of local (ie, loopback) DataReaders and DataWriters.
The sequence of events is:
DataWriter
exists for a topic, using a reliable QoS setting (but notTransientLocal
)publication_matched_status().current_count()
to reach 2 or moreon_publication_matched()
callback on theDataWriterListener
we notify the CV to wake up any waiting threadsDataReader
s are created for the topicDataReader
is created,on_publication_matched()
is triggered on theDataWriterListener
publication_matched_status.current_count()
, gets1
, goes back to sleepDataReader
is created,on_publication_matched()
is triggered on theDataWriterListener
publication_matched_status.current_count()
, gets2
write()
on theDataWriter
DataReader
actually receives the sampleThis seems like a race condition, as usually it works fine and both readers get the sample. Occasionally a unit test will fail because the last
DataReader
that was created doesn't receive anything. I would expect thaton_publication_matched()
would only be triggered once a write call will successfully deliver to the newDataReader
. I've only seen this happen when we're creating two or more readers, so there may be some difference in behavior with how the first reader+writer are matched compared to subsequent ones.For the moment we've worked around this by putting the waiting thread back to sleep for a short time (~5ms) after it sees the matched subscriber count tick up to the threshold value. That seems to fix it. But it would be better if this worked reliably.
I can reproduce this with our product's stack, but I don't currently have a straightforward test that can do so on top of
cyclonedds
alone. I can try to create one if needed, or see if we can get it to occur with more detailed logging enabled in Cyclone.The text was updated successfully, but these errors were encountered: