-
-
Notifications
You must be signed in to change notification settings - Fork 183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding a sorted map container? #105
Comments
Depending on how much data you have, an |
Ohh, I see you have <100 events. You may consider just an |
You may also wanna take a look at this: https://en.cppreference.com/w/cpp/algorithm/make_heap |
Do I need to use All event lists will be iterated over, in sorted order (so not a heap) every time the screen is redrawn. It's only edited in response to user input, so 1-20 times a second. I will be inserting and deleting elements at random positions in response to user input, sometimes at the end, sometimes (>50%) in the middle. The audio thread needs to be able to pull a document tree out of an immer::atom, and read events (via indexing and binary search) without taking a lock. And if the audio thread is holding onto a document revision after the main thread deletes it, the audio thread needs to be able to delete the entire tree in bounded time. According to http://www.rossbencina.com/code/real-time-audio-programming-101-time-waits-for-nothing , both malloc and free can take unbounded time and cause audio stuttering. Is deleting immer structures (lots of hierarchy, but pool-allocated) bounded-time? If the audio thread deletes an Honestly this concern is mostly moot, since it's quite unlikely that the audio thread will be holding onto the last reference to something. That would require the user to undo (pushing the current state to redo stack), then perform an action (clearing the redo stack), all within 1-50 milliseconds (depends on how audio is configured). I'm not sure that a categorical "don't hold locks in the audio thread" is even justified. Someone claims that it only matters if your audio thread is running with realtime scheduler priority... Also I love the flying spaghetti monster at https://sinusoid.es/talks/immer-cppcon17/#/3 |
jimbo1qaz <[email protected]> writes:
All event lists will be iterated over, in sorted order (so not a heap)
every time the screen is redrawn. It's only edited in response to user
input, so 1-20 times a second.
I will be inserting and deleting elements at random positions in
response to user input, sometimes at the end, sometimes (>50%) in the
middle.
The audio thread needs to be able to pull a document tree out of an
immer::atom, and read events without taking a lock.
The I would say, a sorted vector is the best option. You really wanna
optimize iteration here, and red-black trees (std::map or a potential
persistent version on Immer) are gonna be slower to iterate over. With
vectors of <100 elements you are not even gonna get that much benefit
from structural sharing anyways. So I would say, use immer::array or
immer::box<std::vector>. You can play with using a heap on top in
combination (with the forementioned std::make_heap), or using
immer::flex_vector.
And if the audio thread is holding onto a document revision after the
main thread deletes it, the audio thread needs to be able to delete
the entire tree in bounded time. According to
http://www.rossbencina.com/code/real-time-audio-programming-101-time-waits-for-nothing
, both malloc and free can take unbounded time and cause audio
stuttering.
Yes, I've worked a lot in the audio field and I am aware of these issues
:)
Is immer (lots of hierarchy, but pool-allocated) bounded-time? Does
`immer::box<std::map>` enable the audio thread to delete a std::map?
Would that be considered bounded-time or not?
It would not.
Honestly this concern is mostly moot, since it's quite unlikely that
the audio thread will be holding onto the last reference to
something. That would require the user to undo (pushing the current
state to redo stack), then perform an action (clearing the redo
stack), all within 1-50 milliseconds (depends on how audio is
configured).
You are right, it seems almost moot. Still, if you wanna do it "right",
there are some possible solutions.
One of them is to play with custom allocators, but there is still the
problem of the potential traversal of the document itself as a result of
the refcount cycle. I am in discussions with a potential commercial
client of Immer that would like to implement a solution to this directly
in the library.
However! Given your nice single-atom based design (a good one!) you
have a simple solution you can implemnt itself. Add a property
`std::size_t generation` to the root of your document. Every time you
change the document, producing a new one, you increase it, and put the
document in a main-thread controlled list. Then, you also have
`std::atomic<std::size_t> last_seen_generation` variable that the audio
engine, that the audio thread increases once it is done processing a
document (sets it to the generation of that document). Periodically,
the main thread removes from its list every document older than
`last_seen_generation`, potentially destroying them if they are
forgotten. Depending on the structure of your audio engine, there is
still a potential situation that the audio engine somehow holds on to
parts of the document even after being done "processing" it. That is
still a rare case and can be easily avoided.
I'm not sure that a categorical "don't hold locks in the audio thread"
is even justified, but I'm assuming that's the case and I need immer.
Sure! Immer is good for audio apps, I had them in mind as one my main
use-cases ;)
Cheers!
JP
PS. If this is a commercial project and you want more specific advice,
code review, etc, we could do a consulting project under NDA.
…--
∿∿∿ https://sinusoid.al
∿∿∿ https://sinusoid.es
|
Thanks for the help! This is not a commercial project. I actually came up with the single-atom idea talking to user bowbahdoe (don't want to ping, idk?) experienced in Clojure. It seems you're reading and replying via email, so you missed my edit where I said "I'm not sure that a categorical "don't hold locks in the audio thread" is even justified. Someone claims that it only matters if your audio thread is running with realtime scheduler priority..." |
Right now I'm working on an immutable ordered map for one of my own projects that I'd be happy to contribute to immer. Inserts/removes optimally reuse subtrees instead of making deep copies, leaving inserts and removes at O(log n), and searches of course are O(log n). Like std::map, it's implemented via a red-black tree. |
Thanks a lot for @voxoid0! I think that would be nice! However, one can get better performance (due to cache utiliziation) using B-trees, that's what these libraries do: immutable.js-sorted: https://github.com/applitopia/immutable-sorted However, node-base trees can also be interesting in a functional language, like red-black trees, and others. If you are interested these are the ones that Data.Map on Haskell implements: http://groups.csail.mit.edu/mac/users/adams/BB/ |
Ah interesting. I searched for something that discusses the causes and trade-offs in B-tree performance for immutable ordered maps, but couldn't find it; do you know of something that goes into detail? Perhaps you can tell me if my guess is correct: a B-tree, by being able to use more than 2 branches per node, finds a tradeoff between the cache hits enabled by storing multiple child pointers in a contiguous array in memory, versus the additional cost O(c) of inserting or deleting on a node's child pointer list, where c is the avg number of children per node. (Both vector and list children would have O(c) bounded performance but for different reasons.) Insertion or deletion on the tree (and subsequent copying of a node's ancestry) would incur an additional cost in memory and perhaps performance, since when an ancestral line of nodes is rewritten after a write, a wider subtree is affected since each node branches more than 2. I wonder of the performance would depend on frequency of write vs read (or subtree iteration). |
That is a correct general assumption. Reads are expected to be much faster because the tree is way shallower and more of the data is in contiguous blocks, less traversal and more cache efficiency. Ideally you would use block sizes similar to what we use for vectors (have you seen my talks on the topic?) Inserting performance may be worse. I am not a 100% sure, because it depends on the amount of rebalancing you need on the red-black-tree, and on implementation details. I would in general be happy trade off a bit of write performance for faster reading. |
Nice paper and talk; I looked over them the past couple of days. It will be interesting to see how your findings translate to an unordered map. I realized last night that I need to work top-down in my project a bit more first to confirm the appropriate data structures, before moving on with the immutable ordered (multi) map impl. Will give an update afterwards. It happens to be a music project as well :) (algorithmic composition). |
Oh nice! I am actually actually working in that field now with these guys: https://bronze.ai/ And yeah, it sounds good to try not over-engineer with data-structures first. For now for example there is a place where I am needing one and I am simply using a |
i've been working on an immutable b+ tree that would cover this use case as well as #158. by default the internal nodes just store pointers and there are 3 mixins that can decorate them with child counts, max keys, and/or weights which means it can implement either an indexed or an ordered interface (or both although this doesn't seem particularly useful in practice) as well as allowing the implementation of various ordered statistics datastructures and other things requiring fast prefix sum. on a dataset of 10 million 32 bit ints, the random lookup performace is around 2x slower than flex vector but the random insert performance is around 4x faster, and the design made it easy to support random insert on the transient version which is around 100x faster than random insert on a persistent flex vector. the code is still a work in progress and neither the interfaces nor the style are particularly consistent with immer right now, but if this is something you'd be interested in incorporating into immer i can share it and we can go from there. |
well it's almost 3 years later, but if anyone is still interested in an immutable sorted map container, i finally finished my immutable b+ tree library https://github.com/jeffplaisance/BppTree |
That is trully amazing @jeffplaisance. Great work! A quick glance is telling me that this is a high quality library... The API and coding style is a bit different than the Immer one, but I would love adapting it and integrating it into Immer... |
Thanks! It would be awesome for this to be part of immer. I changed all the camelCase to snake_case (all the types are still PascalCase) which may make things slightly easier. Some of the API differences are probably due to the requirement to have unique names for every method across all the mixins so they don't hide each other when combined, which is not ideal, but seemed like the least bad option. There might be a way around it by putting all the API methods in the BppTreeDetail::Shared/Transient/Persistent classes and leveraging SFINAE to disable the ones that aren't applicable, but I was worried that might make IDE autocomplete even less effective than it currently is. |
Yes... I think another thing to think about is how memory is managed in the two libraries. But it would be super interesting to have this in... Hopefully I find some time to go through this in the future, or someone else does ;) |
Following up on this thread to say that jank (native Clojure dialect on LLVM) is currently using immer for maps/vectors/sets and I'm now finding myself in need of sorted maps and sorted sets. Having these in immer would be superb, but I'll take a look at BppTree in the meantime. Thanks for all of the work here, folks! ❤️ |
For a music editing program, I'm storing a series of timestamped events (note data) in an Immer container. I would like to use the timestamps as keys in a sorted map, but Immer lacks sorted maps. I'm trying to emulate it using a wrapper around a sorted vector of
struct{time, event}
(usually under 100 events so copying isn't awful), but having to reimplement all operations is extra work.Should I just stuff an
immer::box<std::map>
or something in an Immer container?The text was updated successfully, but these errors were encountered: