-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RCU + Sync/async reads. #8
Comments
Cf. this example. It takes four semaphores, two used as integers to number the contents of the circular buffer so it is known when the buffer is empty (consumers must wait) and when the buffer is full (producers must wait), and two used as binary mutexes to control the indices (pointers) to the head and the tail of the circular buffer, to do it. The current ImageStreamIO has only an array of functionally-equivalent semaphores, the purpose of which is not yet clear to me. We don't need to control the elements in the circular buffer, we need to instead control pointers to the head and tail of the circular buffer. Sidebar: I have yet to find an INDI driver (application) in MagAOX that actually uses the circular buffer: all shmimMonitors (shmim consumers) instead skip over the earliest elements (images) and uses only the latest element, "flushing" the shmim semaphore* to 0. * roughly equivalent to the inverse of semaphore b->empty in the example link above |
Going forward, are we agreed that ImageStreamIO, as it exists today, does not implement any thread-safe circular buffer control, other than perhaps enabling producers and consumers to detect when the circular buffer is full and empty, respectively? |
Circular buffers are seldom used. We plan to use them for data logging to relax timing requirements on the data loggers. |
Quick addition: it's completely possible to foresee using ISIO as an IPC mechanism where synchronization is necessary (but, that's currently mostly out of scope). |
I detect some mismatch in what we're talking about as circular buffers. @drbitboy the multiple semaphores are to support multiple consumers -- each consumer watches its own semaphore uniquely. This has nothing to do with circular buffers. We do in fact support ImageStreamIO circular buffers in MagAO-X as I think @oguyon is referring to, including in the shmimMonitor CRTP base class. This is what @oguyon we do use the What do we mean here by "thread safe circular buffers"? There is absolutely no concept of multiple writers to the same stream in ImageStreamIO. So there is only one process ever needing to decide where to write next. The issue I worry about is that there is a time when the writer has updated |
@jaredmales - thanks for clarifying the distinction between circular buffers and semaphores. Handling race conditions with "non-atomic" stream updates:
This ensures that all data has been updated when the semaphores are posted (last operation). This is implemented by function ImageStreamIO_UpdateIm. On the consumer end, processinfo_waitoninputstream will properly handle synchronization. At a higher level, this is embedded within the macro INSERT_STD_PROCINFO_COMPUTEFUNC_LOOPSTART. I -think- that if we stick to processinfo_update_output_stream (producer) and processinfo_waitoninputstream (consumer), there is no undefined behavior or race condition.... but very good to double/triple check this ! |
For anyone confused about how streams, procinfo and FPS interact together, I recommend running "milk-tutorial" and following/reading the instructions. Also run "milk-streamCTRL", "milk-fpsCTRL" and "milk-procCTRL" to follow along. |
the processinfo_* stuff is not in ImageStreamIO, rather it's in the CLI right? |
Correct. |
same for MagAO-X (I think our code was mostly written before processinfo reached its current form) |
I am very curious to hear how that works. I've been looking at the code for some weeks now and I still don't see it. The point of semaphores is that multiple processes use one - not multiple - semaphore(s) as a proxy for access to some resource, or to pointers (such as ->cnt1, and maybe ->cnt0) to an array of resources, and the semaphore ensures, in the canonical case, that multiple processes cannot access any resource that is in the process of being modified.: either a consumer drives the (binary) semaphore to zero which stops the producer from modifying it, or the producer drives the semaphore to 0 which prevents the reader from reading it. Normally all processes interesting in a resource should be both posting to and waiting for that resource's semaphore, but the logic I have seen has one side only posting and the other side only waiting. And then there is the poor man's mutex in that non-mutex, non-thread-safe ->write flag. |
@drbitboy this may be useful: flip through slides 5+ |
Okay. So on Slides 8 and 9, the frameGrabber posts to all semaphores which declares "release the hounds" i.e.
I understand all that. My question is this: where is the logic that stops the frameGrabber from writing new data to the WFS image shmim during that processing on Slides (10 and) 11-14, while those second-stage processes are reading from that same WFS image shmim? Without logic to block the frameGrabber from writing to that shmim, that is not "using semaphores," and that code is not thread-safe. |
With a one-image (-element) buffer (i.e. no circular buffer), and assuming* the primary issue is that a frameGrabber should not overwrite data in a shmim while a downstream process is reading it, here is how semaphores are meant to work: cacao_minimal_drbitboy.pdf; the key item to note is that both producers and consumers use both sem_post and sem_(timed/try)wait on the same semaphores. One issue not covered there is initialization. Also, each downstream process (reader) will need to keep track of a timestamp or other incrementing value for each image, to ensure it does not try to grab (sem_wait) the semaphore and read the same image more than once. * I understand that that assumption may not be accurate, but that is the problem semaphores are designed to solve. If the primary issue is something else, then semaphores may not be the solution. Because AFAICT the current code allows an upstream frameGrabber to run free and potentially overwrite data while downstream shmimMonitors are reading those same data. If that is not happening, then it is because the downstream shmimMonitors are fast enough to keep up with the upstream frameGrabber; it is not because of the current semaphore-based implementation. |
Hey @drbitboy, We're indeed running under the specification that what matters is that downstream consumers (C) cannot by any means stall an upstream producer (P). This is by design, and I 100% agree that this is an oddball in the software world.
Short answer: None, by design. Design and defaults can be reconsidered :)
Now, a few things can happen:
Of the 3 cases, 2 and 3 result in data loss, and only 1 opens the way for data corruption (now, this is a problem). Dropping the occasional frame in the AO loop is not too big of a problem. Data corruption is more of a problem.... even though one image and the next are really similar. |
In terms of AO performance, the best option if the consumer is late it so read while the data is being written. It is better to read a "corrupted" frame which has only be partially updated than to wait for a whole new one. Data corruption is better than data loss, because in this context, data corruption is partial data loss (part of a frame), while waiting for the write to be completed means we are loosing a full frame. This is the current default (as it should be). On x86 systems, updating values in an array is atomic (unless the memory is unaligned) at the 8-byte level. So we should be safe with data types spanning 64 bit or less. IMPORTANT: Should we support hardware platforms where updating values is non-atomic ? Do such systems exist ? |
correction: If write duty cycle is <<1, it is better to wait for the new frame as it is likely "just around the corner" |
I would think that, if any downstream process cannot keep up, the best we can do is drop (i.e. skip) the (hopefully) occasional complete frame, because the only other option is corrupt data. And if corrupt data are not a deal breaker, then why waste any effort with an implementation of semaphores that is not doing anything? Because the way it's coded now it is not. |
Missed frames should be low probability events, and we are making it a requirement that the consumer is nearly always on time. Typical missed frame rate should be ~< 1e-6. No strict requirement, but missed frames have a very negative impact on performance. ImageStreamIO is first and foremost designed assuming that the consumers are significantly faster than the producer, but we still need to handle missed/late frames. |
what is the significance of a semaphore value exceeding 1 (i.e. consumer misses a frame)? Or to put it another way, what if a consumer checks the value of a semaphore after completing a wait on that semaphore, and that semaphore's value is not 0? Do we care? Because AFAICT the current code does not care. |
Anything > 0 is a signal to "eat all the data". |
I guess what I am getting at is that the semaphore is probably only a little more efficient for the consumer than waiting for cnt0 or cnt1 to increment, but it is basically the same thing. |
If the data are not in a circular buffer i.e. of length > 1, then there are no data to eat. |
The original thinking was leveraging semaphore counts so that the reader can be informed it woke up late by accessing the semaphore and the semaphore only.
An interesting direction would be to check that the semaphore is still 0 as soon as a reader is done with all its accesses to the SHM. If >0, data corruption is possible and the reader may decide to toss the data. If ==0, then the faster-than-refresh-rate reader hypothesis is confirmed for this iteration. Most of that could be bundled in the procinfo macros for reading the input stream of a process, or in new ImageStreamIO functions. |
I just realized: when the location is the CPU and memory is shared, the image->array.raw memory is separate from the circular buffer memory. Also, the is no allocation for circular buffer if the CBsize parameter passes a 0 (or non-positive) value to ImageStreamIO_createIm_gpu. Is that correct? If yes, then the ->array.raw is the "working" memory that has the latest data from the producer, and the circular buffer is meant as an historical record or log, so in case the consumer falls behind behind the working memory, there might still be some past data in the circular buffer, perhaps representing several semaphore posting (incrementing) events? This would also mean the semaphore is only for coordinating access to the ->array.raw memory, and there are no thread-safe traffic controls for accessing circular buffer memory. Does that all sound right? |
This is all correct. Rationale (to be discussed): ISIO was optimized for speed and latency, so the working memory only holds the current frame to minimize memory footprint of RT processes and reduce cost of maintaining cache coherency. The working memory may be shared between a large number of processes (with one writer and multiple readers) The CB buffer was created as a way to store recent data for non-RT process to grab data without missing frames. This is intended for logging data. The logger then doesn't need to have tight RT requirements, and can be late by a few frames and catch up. We did not do any benchmarking. |
Thank you! |
I am putting an operational test together i.e. more than "I can create a new, or open an existing, shmim." It probably has too many moving parts to be called a "unit" test, but it will be as minimal as possible. |
I'm trying to find a standard, safe and efficient mechanism to handle circular buffers.
Julien found this one which seems to be a good candidate : https://lwn.net/Articles/262464
https://wiki.linuxfoundation.org/realtime/documentation/technical_details/rcu
The text was updated successfully, but these errors were encountered: