Enable sub quantum delay #198

b-ma · 2022-08-05T15:38:45Z

Hey, here is a first draft on #79

Don't merge this yet, I'm really not sure it's finished but It would be nice to have your feedback on what I have done so far. As this is related to the graph, I'm a bit nervous because I'm not sure I fully understand the visit/order stuff :)

I had to remove a test which is failing now but that should be reviewed because it was checking against clamped delay values. I'm not sure why it breaks, need to dig more on that.
Also I'd like to add more tests to be sure things behaves as expected

Except from the one I mentioned the all other tests seems to be happy

b-ma · 2022-08-09T13:30:44Z

Ok I fixed the issue and added some more tests. Let me know if you are ok with the strategy I used in the graph!

b-ma · 2022-08-09T13:34:12Z

src/render/graph.rs

+                    // store node id to clear the node outgoing edges
+                    cycle_breakers.push(node_id);
+                    // remove nodes from mark temp after pos
+                    marked_temp.truncate(pos);


I'm not completely sure of this

Yeah, it's quite hard to reason about it. My gut feeling is that this implementation is to optimistic. It would be safer to clear out marked_temp and also remove these nodes from marked. Then start visiting again from the first item that was in marked_temp

Maybe a nice thing to have would be a unit test with really weird edge case: some cycle inside a cycle I guess, to check this behaves as expected (whatever the solution)?

orottier

Great to see we are nearing a solution for this issue.
Very happy to see the extensive tests for it. It will help us iterate on the solution while checking all works well.
I dropped two notes, but I do not have a solution ready at hand. Would you like me to take the work from here and spend some time on it, or will you have another go?

orottier · 2022-08-09T17:34:46Z

src/render/processor.rs

@@ -26,6 +26,10 @@ pub struct RenderScope {
 ///
 /// Check the `examples/worklet.rs` file for example usage of this trait.
 pub trait AudioProcessor: Send {
+    fn can_break_cycle(&self) -> bool {


I understand why you have put this here, but adding this to the AudioProcessor is a big thing because it forces all our users to have a look at it when implementing custom processors. Let's try to put this somewhere else

We cannot really use downcasting for this problem, e.g. https://stackoverflow.com/a/33687996 without adding an as_any method to all processors, which is undesirable too.

Another solution would be to explicitly store is_cycle_breaker in the Node, and setting that value with a call
ConcreteBaseAudioContext::markCycleBreaker(&self, node_id: &AudioNodeId) that passes the info to the render thread

Yup I agree that's not very clean, especially as it only concerns one (weird) node, that was just the only I found :)

orottier · 2022-08-10T06:15:10Z

src/render/graph.rs

+                    // store node id to clear the node outgoing edges
+                    cycle_breakers.push(node_id);
+                    // remove nodes from mark temp after pos
+                    marked_temp.truncate(pos);


Yeah, it's quite hard to reason about it. My gut feeling is that this implementation is to optimistic. It would be safer to clear out marked_temp and also remove these nodes from marked. Then start visiting again from the first item that was in marked_temp

b-ma · 2022-08-10T06:44:24Z

Would you like me to take the work from here and spend some time on it, or will you have another go?

Actually if you have ideas to iterate, please go on yes! I can try to contribute the test I spoke about but then I'm short with ideas about how to continue

orottier · 2022-08-10T15:15:00Z

Okay I will have a try at all three things (can_break_cycle, tests, and correct sorting algorithm). I have some ideas

b-ma · 2022-08-10T15:28:05Z

Okay I will have a try at all three things (can_break_cycle, tests, and correct sorting algorithm). I have some ideas

ok cool!

instead of at runtime: before: self.nodes.get(...).unwrap().borrow_mut() after: self.nodes.get_mut(...).unwrap().get_mut() See https://doc.rust-lang.org/std/cell/struct.RefCell.html#method.get_mut

b-ma · 2022-08-11T17:01:17Z

7ef0597 Nice approach, way cleaner!

Graph ordering might depend on the insertion order of the nodes and edges, therefore, shuffle the inputs and run the tests a few times

github-actions · 2022-08-12T15:26:09Z

Benchmark result:

bench_ctor
Instructions: 870928 (-0.427818%)
L1 Accesses: 1760039 (-0.276500%)
L2 Accesses: 7851 (+0.140306%)
RAM Accesses: 10326 (+0.135764%)
Estimated Cycles: 2160704 (-0.200227%)

bench_sine
Instructions: 9389581 (-0.048552%)
L1 Accesses: 14244066 (-0.037777%)
L2 Accesses: 32071 (-1.177087%)
RAM Accesses: 12474 (+0.249136%)
Estimated Cycles: 14841011 (-0.041813%)

bench_sine_gain
Instructions: 10290757 (-0.052097%)
L1 Accesses: 15560261 (-0.076431%)
L2 Accesses: 45263 (+13.25660%)
RAM Accesses: 12624 (+0.222293%)
Estimated Cycles: 16228416 (+0.096023%)

bench_sine_gain_delay
Instructions: 23680210 (-0.272988%)
L1 Accesses: 34135264 (-0.163417%)
L2 Accesses: 133714 (+8.836217%)
RAM Accesses: 13858 (+0.159005%)
Estimated Cycles: 35288864 (-0.002335%)

b-ma · 2022-08-12T15:31:45Z

The graph test suite looks very nice!

The test fails, so the cycle breaking algo is not correct!

It's not an ideal solution, but it works. TODO make nicer

orottier · 2022-08-12T19:08:14Z

Okay this is conceptually finished but I'm still looking for a more elegant graph ordering after the cycles are broken

github-actions · 2022-08-12T19:13:04Z

Benchmark result:

bench_ctor
Instructions: 864963 (-1.109790%)
L1 Accesses: 1750706 (-0.805306%)
L2 Accesses: 7863 (+0.293367%)
RAM Accesses: 10319 (+0.067882%)
Estimated Cycles: 2151186 (-0.639850%)

bench_sine
Instructions: 9380677 (-0.143334%)
L1 Accesses: 14231151 (-0.128412%)
L2 Accesses: 32371 (-0.252673%)
RAM Accesses: 12468 (+0.200916%)
Estimated Cycles: 14829386 (-0.120110%)

bench_sine_gain
Instructions: 10279645 (-0.160021%)
L1 Accesses: 15544623 (-0.176854%)
L2 Accesses: 45716 (+14.39009%)
RAM Accesses: 12620 (+0.190537%)
Estimated Cycles: 16214903 (+0.012675%)

bench_sine_gain_delay
Instructions: 23665427 (-0.335245%)
L1 Accesses: 34115782 (-0.220396%)
L2 Accesses: 133624 (+8.762962%)
RAM Accesses: 13852 (+0.115640%)
Estimated Cycles: 35268722 (-0.059411%)

b-ma · 2022-08-13T05:47:10Z

edit: I have completely messed around here, restarting...

I think if we have a pile of tests like:

(A)

            +---------+    +---------+   
            v         |    v         |
+---+     +---+     +----------+   +---+
| 1 | --> | 2 | --> | 3: delay |   | 4 |
+---+     +---+     +----------+   +---+
            |            |           |
            v            +-----------+
          +---+
          | 5 |
          +---+

where, if I'm right, 3 should break all cycles

To make it even more perverse and provoke the recursive stuff, we could also add connections between 4 and 2:

(B)

           +--------------------------+
           | +--------+    +--------+ |  
           v v        |    v        | |
+---+     +---+     +----------+   +---+
| 1 | --> | 2 | --> | 3: delay |   | 4 |
+---+     +---+     +----------+   +---+
            |            |           |
            v            +-----------+
          +---+
          | 5 |
          +---+

or the other way around

(C)

            +---------+     +--------+   
            v         |     v        | 
+---+     +---+     +----------+   +---+
| 1 | --> | 2 | --> | 3: delay |   | 4 |
+---+     +---+     +----------+   +---+
           | |        |             | |
           | |        +-------------+ |
           | +------------------------+           
           v            
          +---+
          | 5 |
          +---+

If I reason well (which I'm not sure :) In all these 3 cases, 3 should break the cycle, therefore 4 should not be muted

Something like that should result in both 4 muted and 3 breaks cycle:

(D)

           +--------------------------+
           | +--------+    +--------+ |  
           v v        |    v        | |
+---+     +---+     +----------+   +---+
| 1 | --> | 2 | --> | 3: delay |   | 4 |
+---+     +---+     +----------+   +---+
           | |        |             | |
           | |        +-------------+ |
           | +------------------------+           
           v            
          +---+
          | 5 |
          +---+

My impression is that with such test cases running consistently across several iterations, we can know if the optimistic solution is enough (i.e. truncate only is ok, which I actually I can't think why it should not but as you said it's quite hard to reason about) or if being pessimistic is required (i.e. clear delay internal connection and restart ordering the graph from the beginning, considering that creating feedback delays lines is maybe not something people do so often that is it a big issue neither :)

and hence is in a cycle. This means the Graph does not need to do any bookkeeping - cleaner code (and possibly faster because because it takes less memory this way)

orottier · 2022-08-13T19:39:24Z

I'm adding your examples, stay tuned!
Unfortunately I already concluded the optimistic graph sorting is incorrect at b1498a2
This still has the truncate and continue style, but the cycle integration test consistently fails
I tried to fix the optimistic method but could not get it to work yet, I think you need to unroll the marked, marked_temp and ordered items in a very specific way. Maybe I will succeed tomorror

github-actions · 2022-08-13T19:43:43Z

Benchmark result:

bench_ctor
Instructions: 866422 (-0.387793%)
L1 Accesses: 1752533 (-0.256908%)
L2 Accesses: 7859 (+0.280720%)
RAM Accesses: 10317 (+0.048487%)
Estimated Cycles: 2152923 (-0.196046%)

bench_sine
Instructions: 9381721 (-0.080368%)
L1 Accesses: 14233983 (-0.054804%)
L2 Accesses: 30936 (-4.029781%)
RAM Accesses: 12462 (+0.120511%)
Estimated Cycles: 14824833 (-0.092832%)

bench_sine_gain
Instructions: 10280670 (-0.102767%)
L1 Accesses: 15551282 (-0.084102%)
L2 Accesses: 40424 (+1.374260%)
RAM Accesses: 12614 (+0.103166%)
Estimated Cycles: 16194892 (-0.061062%)

bench_sine_gain_delay
Instructions: 23712323 (-0.137747%)
L1 Accesses: 34214482 (+0.069870%)
L2 Accesses: 129691 (+5.098907%)
RAM Accesses: 13849 (+0.065029%)
Estimated Cycles: 35347652 (+0.157724%)

orottier · 2022-08-13T19:51:08Z

L2 access seems to be improved with b85de76
In that vein I opened #208 to explore further optimizations

b-ma · 2022-08-14T05:45:05Z

src/node/delay.rs

-        let in_cycle = self.in_cycle.load(Ordering::SeqCst);
+
+        let latest_frame_written = self.latest_frame_written.load(Ordering::SeqCst);
+        let in_cycle = latest_frame_written != scope.current_frame;


I'm not that sure this is always right when in cycle actually:

if not in cycle the order of rendering is guaranteed, therefore we can trust this check

but when in cycle, the order of rendering between Reader and Writer is not guaranteed anymore (at least it wasn't before, did it changed somehow?), so the check should fail sometimes?

I actually didn't manage to make a test crash and I see your sort_cycle_breaker test, but it feels a bit weird to me, I don't understand what could have change here (and I'm pretty sure I saw this behaviour in the past, that's why there was this loop in node::delay::tests::test_node_stays_alive_long_enough)

Hmmmm right, it is quite nuanced actually and my code needs to be improved

When not in a cycle, there is a edge from writer to reader so writer always renders first

When in a cycle, that edge is dropped. The nature of the cycle means output of the reader feeds into the writer, hence, the reader always renders first

However, when the user breaks the cycle manually by dropping some other connections, the order is no longer guaranteed and thus random

The last case is a bit of an edge case of course, but I could make the in_cycle property sticky (once you are in a cycle, treat as if you will always stay in that cycle) so we prevent erratic behaviour in rendering of sub-quantum delays.

Another issue I realize is when the Writer is dropped, it will stop updating the latest_frame_written. This causes the Reader to set in_cycle true, disallowing sub-quantum delays. If the delay is set to 100 samples this behaviour is not right becase the final 28 frames om the last Writer call should still render directly in the first 28 frames of the next Reader output.

Pfft, not simple stuff, I will iterate again

b-ma · 2022-08-14T06:05:42Z

Unfortunately I already concluded the optimistic graph sorting is incorrect at b1498a2
This still has the truncate and continue style, but the cycle integration test consistently fails

oh ok, I didn't see the sub comment in the commit message, so pessimistic we must be indeed!

L2 access seems to be improved with b85de76
In that vein I opened #208 to explore further optimizations

Nice one! (perf is really a trap :)

From discussion at orottier#198 Another issue I realize is when the Writer is dropped, it will stop updating the latest_frame_written. This causes the Reader to set in_cycle true, disallowing sub-quantum delays. If the delay is set to 100 samples this behaviour is not right becase the final 28 frames om the last Writer call should still render directly in the first 28 frames of the next Reader output.

Once you are in a cycle, treat as if you will always stay in that cycle, so we prevent erratic behaviour in rendering of sub-quantum delays.

github-actions · 2022-08-14T19:19:44Z

Benchmark result:

bench_ctor
Instructions: 862548 (-0.432532%)
L1 Accesses: 1745591 (-0.278611%)
L2 Accesses: 7845 (-0.025487%)
RAM Accesses: 10354 (+0.174149%)
Estimated Cycles: 2147206 (-0.197865%)

bench_sine
Instructions: 9350215 (-0.084579%)
L1 Accesses: 14182275 (-0.067426%)
L2 Accesses: 30526 (+0.315478%)
RAM Accesses: 12495 (+0.120192%)
Estimated Cycles: 14772230 (-0.057940%)

bench_sine_gain
Instructions: 10232730 (-0.106817%)
L1 Accesses: 15473990 (-0.084477%)
L2 Accesses: 38711 (+0.454121%)
RAM Accesses: 12647 (+0.102897%)
Estimated Cycles: 16110190 (-0.072901%)

bench_sine_gain_delay
Instructions: 23656078 (+0.309289%)
L1 Accesses: 34120555 (+0.677087%)
L2 Accesses: 130593 (+6.730251%)
RAM Accesses: 13886 (+0.086493%)
Estimated Cycles: 35259530 (+0.774733%)

b-ma · 2022-08-15T05:40:15Z

src/node/constant_source.rs

@@ -186,7 +186,8 @@ impl AudioProcessor for ConstantSourceRenderer {
            current_time += dt;
        }

-        true
+        // tail_time false when output has ended this quantum
+        stop_time > next_block_time


Remembered that we had to output one channel of silence before returning false, but can't find it in the spec actually... I think the check line 165 makes no sense anymore then?

handled in e088c95

github-actions · 2022-08-15T06:05:27Z

Benchmark result:

bench_ctor
Instructions: 862548 (-0.432532%)
L1 Accesses: 1745595 (-0.278212%)
L2 Accesses: 7843 (-0.089172%)
RAM Accesses: 10352 (+0.154799%)
Estimated Cycles: 2147130 (-0.201954%)

bench_sine
Instructions: 9350215 (-0.084579%)
L1 Accesses: 14181997 (-0.068674%)
L2 Accesses: 30804 (+0.887564%)
RAM Accesses: 12495 (+0.136240%)
Estimated Cycles: 14773342 (-0.052743%)

bench_sine_gain
Instructions: 10232730 (-0.106817%)
L1 Accesses: 15473720 (-0.084827%)
L2 Accesses: 38983 (+0.590907%)
RAM Accesses: 12645 (+0.102913%)
Estimated Cycles: 16111210 (-0.071557%)

bench_sine_gain_delay
Instructions: 23656078 (+0.309289%)
L1 Accesses: 34121108 (+0.688026%)
L2 Accesses: 130041 (+3.624107%)
RAM Accesses: 13885 (+0.093714%)
Estimated Cycles: 35257288 (+0.732418%)

github-actions · 2022-08-15T09:28:13Z

Benchmark result:

bench_ctor
Instructions: 862548 (-0.432532%)
L1 Accesses: 1745594 (-0.278041%)
L2 Accesses: 7844 (-0.127324%)
RAM Accesses: 10352 (+0.154799%)
Estimated Cycles: 2147134 (-0.202511%)

bench_sine
Instructions: 9350215 (-0.084579%)
L1 Accesses: 14182097 (-0.067434%)
L2 Accesses: 30704 (+0.310366%)
RAM Accesses: 12495 (+0.136240%)
Estimated Cycles: 14772942 (-0.057505%)

bench_sine_gain
Instructions: 10232730 (-0.106817%)
L1 Accesses: 15473727 (-0.083794%)
L2 Accesses: 38974 (+0.172206%)
RAM Accesses: 12647 (+0.118746%)
Estimated Cycles: 16111242 (-0.075152%)

bench_sine_gain_delay
Instructions: 23651203 (+0.267890%)
L1 Accesses: 34111993 (+0.633090%)
L2 Accesses: 131281 (+5.935848%)
RAM Accesses: 13885 (+0.086499%)
Estimated Cycles: 35254373 (+0.719379%)

orottier · 2022-08-15T09:28:48Z

Thanks for the update, I think we have done enought in this PR.
I added two more test cases, one remark:

Something like that should result in both 4 muted and 3 breaks cycle: (D)

Because 2 is also part of the cycle with 4, it is muted as well

b-ma added 2 commits August 5, 2022 17:31

enable sub quamtum delay - first pass

a89438b

fixed sub-quantum delay + added tests

a68c696

b-ma commented Aug 9, 2022

View reviewed changes

cleaning

23dc519

orottier reviewed Aug 10, 2022

View reviewed changes

orottier added 3 commits August 11, 2022 17:32

Remove AudioProcesor::can_break_cycle - handle it with control msgs

7ef0597

Merge remote-tracking branch 'origin/main' into feature/zero-delay

21c3980

More compile-time checked mutable borrows in graph render

5565b3b

instead of at runtime: before: self.nodes.get(...).unwrap().borrow_mut() after: self.nodes.get_mut(...).unwrap().get_mut() See https://doc.rust-lang.org/std/cell/struct.RefCell.html#method.get_mut

orottier added 5 commits August 11, 2022 20:34

Fix borrow/borrow_mut runtime exception in graph

b30a0d8

Merge branch 'main' into feature/zero-delay

3596248

Add integration test suite for graph ordering

57b157a

Make nice test framework for graph ordering - shuffle inputs

83ae26d

Graph ordering might depend on the insertion order of the nodes and edges, therefore, shuffle the inputs and run the tests a few times

Merge remote-tracking branch 'oriGin/main' into feature/zero-delay

aadd139

orottier added 3 commits August 12, 2022 20:41

Add test case for graph ordering with cycle breaker

b1498a2

The test fails, so the cycle breaking algo is not correct!

Pessimistically trigger a new graph sort when applying a cycle breaker

5a355e7

It's not an ideal solution, but it works. TODO make nicer

Add more graph sorting integration tests

34d9274

orottier added 3 commits August 13, 2022 20:48

Setup shared value so DelayReader knows when it runs before Writer

b85de76

and hence is in a cycle. This means the Graph does not need to do any bookkeeping - cleaner code (and possibly faster because because it takes less memory this way)

Simplify graph ordering logic when cycle breakers are applied

b77f9b1

Add one more tests case (and 3 more incoming) for graph cycle ordering

01ec89e

b-ma commented Aug 14, 2022

View reviewed changes

orottier added 4 commits August 14, 2022 21:13

DelayReader: make the in_cycle property sticky

39fb5d3

Once you are in a cycle, treat as if you will always stay in that cycle, so we prevent erratic behaviour in rendering of sub-quantum delays.

Merge branch 'main' into feature/zero-delay

8a2a6d9

applied clippy hints

d613130

b-ma commented Aug 15, 2022

View reviewed changes

removed redundant end time check

e088c95

Add graph ordering tests for exotic cycles

143d4b6

orottier merged commit 8e3455f into orottier:main Aug 15, 2022

orottier mentioned this pull request Aug 23, 2022

Support zero delay in DelayNode #79

Closed

b-ma deleted the feature/zero-delay branch November 4, 2023 06:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable sub quantum delay #198

Enable sub quantum delay #198

b-ma commented Aug 5, 2022 •

edited

Loading

b-ma commented Aug 9, 2022

b-ma Aug 9, 2022

orottier Aug 10, 2022

b-ma Aug 10, 2022

orottier left a comment

orottier Aug 9, 2022

orottier Aug 10, 2022

b-ma Aug 10, 2022

orottier Aug 10, 2022

b-ma commented Aug 10, 2022

orottier commented Aug 10, 2022

b-ma commented Aug 10, 2022

b-ma commented Aug 11, 2022

github-actions bot commented Aug 12, 2022

b-ma commented Aug 12, 2022

orottier commented Aug 12, 2022

github-actions bot commented Aug 12, 2022

b-ma commented Aug 13, 2022 •

edited

Loading

orottier commented Aug 13, 2022

github-actions bot commented Aug 13, 2022

orottier commented Aug 13, 2022

b-ma Aug 14, 2022 •

edited

Loading

orottier Aug 14, 2022 •

edited

Loading

b-ma commented Aug 14, 2022 •

edited

Loading

github-actions bot commented Aug 14, 2022

b-ma Aug 15, 2022

b-ma Aug 15, 2022

github-actions bot commented Aug 15, 2022

github-actions bot commented Aug 15, 2022

orottier commented Aug 15, 2022

Enable sub quantum delay #198

Enable sub quantum delay #198

Conversation

b-ma commented Aug 5, 2022 • edited Loading

b-ma commented Aug 9, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

orottier left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

b-ma commented Aug 10, 2022

orottier commented Aug 10, 2022

b-ma commented Aug 10, 2022

b-ma commented Aug 11, 2022

github-actions bot commented Aug 12, 2022

b-ma commented Aug 12, 2022

orottier commented Aug 12, 2022

github-actions bot commented Aug 12, 2022

b-ma commented Aug 13, 2022 • edited Loading

orottier commented Aug 13, 2022

github-actions bot commented Aug 13, 2022

orottier commented Aug 13, 2022

b-ma Aug 14, 2022 • edited Loading

Choose a reason for hiding this comment

orottier Aug 14, 2022 • edited Loading

Choose a reason for hiding this comment

b-ma commented Aug 14, 2022 • edited Loading

github-actions bot commented Aug 14, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Aug 15, 2022

github-actions bot commented Aug 15, 2022

orottier commented Aug 15, 2022

b-ma commented Aug 5, 2022 •

edited

Loading

b-ma commented Aug 13, 2022 •

edited

Loading

b-ma Aug 14, 2022 •

edited

Loading

orottier Aug 14, 2022 •

edited

Loading

b-ma commented Aug 14, 2022 •

edited

Loading