Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Experimenting towards allocator-free rendering #368

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

orottier
Copy link
Owner

@orottier orottier commented Oct 5, 2023

The goal is to run cargo run --example audio_buffer_source_events without failing for now.
It shows that the #366 works like intended, there is no deallocation of the audio buffer taking place.
But this PR made me realize we also need to look into all Arc<..> items of renderers, because it now fails with

  13: <alloc::sync::Arc<T> as core::ops::drop::Drop>::drop
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/alloc/src/sync.rs:1897:13
  14: core::ptr::drop_in_place<alloc::sync::Arc<web_audio_api::AtomicF64>>
             at /rustc/d5c2e9c342b358556da91d61ed4133f6f50fc0c3/library/core/src/ptr/mod.rs:497:1

And then there's still more issues to look at (Box<dyn AudioProcessor> for example)

@orottier orottier requested a review from b-ma October 5, 2023 18:55
@orottier orottier changed the base branch from main to feature/async-deallocations October 5, 2023 18:55
@@ -128,7 +128,7 @@ struct AudioParamEventTimeline {
impl AudioParamEventTimeline {
fn new() -> Self {
Self {
inner: Vec::new(),
inner: Vec::with_capacity(5),
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a real solution to the problem

/// Reusable output buffers, consumed by subsequent Nodes in this graph
outputs: Vec<AudioRenderQuantum>,
outputs: SmallVec<[AudioRenderQuantum; 2]>,
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not real solutions to the deallocation problems of this struct

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, maybe I'm missing something but it seems to me that the only nodes that have multiple inputs / outputs are the ChannelSplitterNode and ChannelMergerNode which are clamped to MAX_CHANNELS, cf. https://webaudio.github.io/web-audio-api/#dom-baseaudiocontext-createchannelmerger, no?

Then maybe we could simply use an ArrayVec<AudioRenderQuantum, MAX_CHANNELS> to fix this?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True that, but of course we also allow users to implement their own AudioNodes which I think could use an unbounded number of inputs/outputs.

Scratch that, in the JS interface the https://webaudio.github.io/web-audio-api/#AudioWorkletNode is bound to 1 input and 1 output. So technically I could restrict the raw AudioNode trait and panic when the users supplies a 32+ input/output count.

We need to check performance though, ArrayVec<AudioRenderQuantum, MAX_CHANNELS> is a rather large object to have on the stack, and rust will probably memcpy it around a lot, so we may want to have it on the heap anyway. We can experiment with pre-allocating that Vec on the control thread

@@ -11,6 +11,9 @@ use super::{Alloc, AudioParamValues, AudioProcessor, AudioRenderQuantum};
use crate::node::ChannelConfig;
use crate::render::RenderScope;

const INITIAL_GRAPH_SIZE: usize = 16;
const INITIAL_CHANNEL_DATA_COUNT: usize = INITIAL_GRAPH_SIZE * 4;
Copy link
Collaborator

@b-ma b-ma Oct 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would not per se solve the allocation problem, but taking a very large value for INITIAL_GRAPH_SIZE could help (4096 or 8185), as vector growth seems to be exponential (cf. https://nnethercote.github.io/perf-book/heap-allocations.html?highlight=borrow#vec-growth)

@b-ma
Copy link
Collaborator

b-ma commented Oct 6, 2023

Cool, this is very interesting! I will try to have a closer look on this and the previous PR this week-end (I'm a bit running to find some time these weeks...)

@orottier orottier changed the base branch from feature/async-deallocations to main October 27, 2023 11:30
@orottier orottier changed the base branch from main to feature/store-nodes-in-vec-not-hashmap October 27, 2023 12:05
Base automatically changed from feature/store-nodes-in-vec-not-hashmap to main October 30, 2023 09:37
@orottier orottier changed the title Don't merge: experimenting towards allocator-free rendering Experimenting towards allocator-free rendering Oct 30, 2023
@orottier orottier mentioned this pull request Oct 30, 2023
@b-ma
Copy link
Collaborator

b-ma commented Nov 5, 2023

Hey, I just made a small hacky experiment on how a renderer could just drop itself in GC with llq and impl Drop: https://gist.github.com/b-ma/53094341fd51c1ed1ab165b840ea691b

That's rather weird I must confess :), but it seems to work... maybe this can give some ideas

@orottier
Copy link
Owner Author

orottier commented Nov 6, 2023

Hey, I just made a small hacky experiment on how a renderer could

Well, this is definitely a new use case for recursive data structures. Thanks ;)

However, I'm getting more and more convinced that the dyn AudioProcessor should probably drop inside the render thread, and then defer the actual deallocation of the Box to the garbage collector thread. Something like this (warning, unsafe and untested)

pub fn deallocate_audio_processor(&self, value: Box<dyn AudioProcessor>) {
    let p = Box::into_raw(value);
    unsafe {
        ptr::drop_in_place(p); // drop the actual AudioProcessor fields
        dealloc(p as *mut u8, Layout::new::<Box<dyn AudioProcessor>>()); // TODO - perform this call by GC thread
    }
}

The reason I want the processor to drop inside the render thread is that I don't want to put the burden of clearing your Vec<AudioRenderQuantum> etc on the users of our library. If they forget, and we ship the non-empty vec outside the render thread a data race on the inner Rc may occur resulting in undefined behaviour.

Let me share my current plans:
The Send bound on AudioProcessor bugs me and requires us to use unsafe already e.g. https://github.com/b-ma/web-audio-api-rs/blob/main/src/node/delay.rs#L289 https://github.com/b-ma/web-audio-api-rs/blob/main/src/node/dynamics_compressor.rs#L264
This means any user of our library writing advanced processors will run into this too - at least any processor that uses a ring buffer.
I intend to remove it and come up with a way to instantiate the renderer inside the render thread, as opposed to the current situation where the renderer is instantiated inside the control thread and then shipped.
This will make the AudioProcessor trait more complex. However I am making progress on a true rust AudioWorkletNode API which will be easier to use and aligns more with the specification - and which will be useful for ircam-ismm/node-web-audio-api#28
Then only expert users will need to use the AudioProcessor trait - and perhaps we can even remove it from the public API altogether.
I will open a draft PR around this shortly

@b-ma
Copy link
Collaborator

b-ma commented Nov 6, 2023

ok, not sure I understand every detail (...pretty sure I actually don't, let's be honest), but looking forward to see the thing :)

@b-ma
Copy link
Collaborator

b-ma commented Nov 6, 2023

(my experiment is quite fun anyway! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants