Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable sub quantum delay #198

Merged
merged 23 commits into from
Aug 15, 2022
Merged

Enable sub quantum delay #198

merged 23 commits into from
Aug 15, 2022

Conversation

b-ma
Copy link
Collaborator

@b-ma b-ma commented Aug 5, 2022

Hey, here is a first draft on #79

Don't merge this yet, I'm really not sure it's finished but It would be nice to have your feedback on what I have done so far. As this is related to the graph, I'm a bit nervous because I'm not sure I fully understand the visit/order stuff :)

  • I had to remove a test which is failing now but that should be reviewed because it was checking against clamped delay values. I'm not sure why it breaks, need to dig more on that.
  • Also I'd like to add more tests to be sure things behaves as expected

Except from the one I mentioned the all other tests seems to be happy

@b-ma
Copy link
Collaborator Author

b-ma commented Aug 9, 2022

Ok I fixed the issue and added some more tests. Let me know if you are ok with the strategy I used in the graph!

// store node id to clear the node outgoing edges
cycle_breakers.push(node_id);
// remove nodes from mark temp after pos
marked_temp.truncate(pos);
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not completely sure of this

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's quite hard to reason about it. My gut feeling is that this implementation is to optimistic. It would be safer to clear out marked_temp and also remove these nodes from marked. Then start visiting again from the first item that was in marked_temp

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a nice thing to have would be a unit test with really weird edge case: some cycle inside a cycle I guess, to check this behaves as expected (whatever the solution)?

Copy link
Owner

@orottier orottier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see we are nearing a solution for this issue.
Very happy to see the extensive tests for it. It will help us iterate on the solution while checking all works well.
I dropped two notes, but I do not have a solution ready at hand. Would you like me to take the work from here and spend some time on it, or will you have another go?

@@ -26,6 +26,10 @@ pub struct RenderScope {
///
/// Check the `examples/worklet.rs` file for example usage of this trait.
pub trait AudioProcessor: Send {
fn can_break_cycle(&self) -> bool {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand why you have put this here, but adding this to the AudioProcessor is a big thing because it forces all our users to have a look at it when implementing custom processors. Let's try to put this somewhere else

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot really use downcasting for this problem, e.g. https://stackoverflow.com/a/33687996 without adding an as_any method to all processors, which is undesirable too.

Another solution would be to explicitly store is_cycle_breaker in the Node, and setting that value with a call
ConcreteBaseAudioContext::markCycleBreaker(&self, node_id: &AudioNodeId) that passes the info to the render thread

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup I agree that's not very clean, especially as it only concerns one (weird) node, that was just the only I found :)

// store node id to clear the node outgoing edges
cycle_breakers.push(node_id);
// remove nodes from mark temp after pos
marked_temp.truncate(pos);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it's quite hard to reason about it. My gut feeling is that this implementation is to optimistic. It would be safer to clear out marked_temp and also remove these nodes from marked. Then start visiting again from the first item that was in marked_temp

@b-ma
Copy link
Collaborator Author

b-ma commented Aug 10, 2022

Would you like me to take the work from here and spend some time on it, or will you have another go?

Actually if you have ideas to iterate, please go on yes! I can try to contribute the test I spoke about but then I'm short with ideas about how to continue

@orottier
Copy link
Owner

Okay I will have a try at all three things (can_break_cycle, tests, and correct sorting algorithm). I have some ideas

@b-ma
Copy link
Collaborator Author

b-ma commented Aug 10, 2022

Okay I will have a try at all three things (can_break_cycle, tests, and correct sorting algorithm). I have some ideas

ok cool!

@b-ma
Copy link
Collaborator Author

b-ma commented Aug 11, 2022

7ef0597 Nice approach, way cleaner!

@github-actions
Copy link

Benchmark result:

bench_ctor
Instructions: 870928 (-0.427818%)
L1 Accesses: 1760039 (-0.276500%)
L2 Accesses: 7851 (+0.140306%)
RAM Accesses: 10326 (+0.135764%)
Estimated Cycles: 2160704 (-0.200227%)

bench_sine
Instructions: 9389581 (-0.048552%)
L1 Accesses: 14244066 (-0.037777%)
L2 Accesses: 32071 (-1.177087%)
RAM Accesses: 12474 (+0.249136%)
Estimated Cycles: 14841011 (-0.041813%)

bench_sine_gain
Instructions: 10290757 (-0.052097%)
L1 Accesses: 15560261 (-0.076431%)
L2 Accesses: 45263 (+13.25660%)
RAM Accesses: 12624 (+0.222293%)
Estimated Cycles: 16228416 (+0.096023%)

bench_sine_gain_delay
Instructions: 23680210 (-0.272988%)
L1 Accesses: 34135264 (-0.163417%)
L2 Accesses: 133714 (+8.836217%)
RAM Accesses: 13858 (+0.159005%)
Estimated Cycles: 35288864 (-0.002335%)

@b-ma
Copy link
Collaborator Author

b-ma commented Aug 12, 2022

The graph test suite looks very nice!

The test fails, so the cycle breaking algo is not correct!
It's not an ideal solution, but it works. TODO make nicer
@orottier
Copy link
Owner

Okay this is conceptually finished but I'm still looking for a more elegant graph ordering after the cycles are broken

@github-actions
Copy link

Benchmark result:

bench_ctor
Instructions: 864963 (-1.109790%)
L1 Accesses: 1750706 (-0.805306%)
L2 Accesses: 7863 (+0.293367%)
RAM Accesses: 10319 (+0.067882%)
Estimated Cycles: 2151186 (-0.639850%)

bench_sine
Instructions: 9380677 (-0.143334%)
L1 Accesses: 14231151 (-0.128412%)
L2 Accesses: 32371 (-0.252673%)
RAM Accesses: 12468 (+0.200916%)
Estimated Cycles: 14829386 (-0.120110%)

bench_sine_gain
Instructions: 10279645 (-0.160021%)
L1 Accesses: 15544623 (-0.176854%)
L2 Accesses: 45716 (+14.39009%)
RAM Accesses: 12620 (+0.190537%)
Estimated Cycles: 16214903 (+0.012675%)

bench_sine_gain_delay
Instructions: 23665427 (-0.335245%)
L1 Accesses: 34115782 (-0.220396%)
L2 Accesses: 133624 (+8.762962%)
RAM Accesses: 13852 (+0.115640%)
Estimated Cycles: 35268722 (-0.059411%)

@b-ma
Copy link
Collaborator Author

b-ma commented Aug 13, 2022

edit: I have completely messed around here, restarting...

I think if we have a pile of tests like:

(A)

            +---------+    +---------+   
            v         |    v         |
+---+     +---+     +----------+   +---+
| 1 | --> | 2 | --> | 3: delay |   | 4 |
+---+     +---+     +----------+   +---+
            |            |           |
            v            +-----------+
          +---+
          | 5 |
          +---+

where, if I'm right, 3 should break all cycles

To make it even more perverse and provoke the recursive stuff, we could also add connections between 4 and 2:

(B)

           +--------------------------+
           | +--------+    +--------+ |  
           v v        |    v        | |
+---+     +---+     +----------+   +---+
| 1 | --> | 2 | --> | 3: delay |   | 4 |
+---+     +---+     +----------+   +---+
            |            |           |
            v            +-----------+
          +---+
          | 5 |
          +---+

or the other way around

(C)

            +---------+     +--------+   
            v         |     v        | 
+---+     +---+     +----------+   +---+
| 1 | --> | 2 | --> | 3: delay |   | 4 |
+---+     +---+     +----------+   +---+
           | |        |             | |
           | |        +-------------+ |
           | +------------------------+           
           v            
          +---+
          | 5 |
          +---+

If I reason well (which I'm not sure :) In all these 3 cases, 3 should break the cycle, therefore 4 should not be muted

Something like that should result in both 4 muted and 3 breaks cycle:

(D)

           +--------------------------+
           | +--------+    +--------+ |  
           v v        |    v        | |
+---+     +---+     +----------+   +---+
| 1 | --> | 2 | --> | 3: delay |   | 4 |
+---+     +---+     +----------+   +---+
           | |        |             | |
           | |        +-------------+ |
           | +------------------------+           
           v            
          +---+
          | 5 |
          +---+

My impression is that with such test cases running consistently across several iterations, we can know if the optimistic solution is enough (i.e. truncate only is ok, which I actually I can't think why it should not but as you said it's quite hard to reason about) or if being pessimistic is required (i.e. clear delay internal connection and restart ordering the graph from the beginning, considering that creating feedback delays lines is maybe not something people do so often that is it a big issue neither :)

and hence is in a cycle. This means the Graph does not need to do any
bookkeeping - cleaner code (and possibly faster because because it takes
less memory this way)
@orottier
Copy link
Owner

I'm adding your examples, stay tuned!
Unfortunately I already concluded the optimistic graph sorting is incorrect at b1498a2
This still has the truncate and continue style, but the cycle integration test consistently fails
I tried to fix the optimistic method but could not get it to work yet, I think you need to unroll the marked, marked_temp and ordered items in a very specific way. Maybe I will succeed tomorror

@github-actions
Copy link

Benchmark result:

bench_ctor
Instructions: 866422 (-0.387793%)
L1 Accesses: 1752533 (-0.256908%)
L2 Accesses: 7859 (+0.280720%)
RAM Accesses: 10317 (+0.048487%)
Estimated Cycles: 2152923 (-0.196046%)

bench_sine
Instructions: 9381721 (-0.080368%)
L1 Accesses: 14233983 (-0.054804%)
L2 Accesses: 30936 (-4.029781%)
RAM Accesses: 12462 (+0.120511%)
Estimated Cycles: 14824833 (-0.092832%)

bench_sine_gain
Instructions: 10280670 (-0.102767%)
L1 Accesses: 15551282 (-0.084102%)
L2 Accesses: 40424 (+1.374260%)
RAM Accesses: 12614 (+0.103166%)
Estimated Cycles: 16194892 (-0.061062%)

bench_sine_gain_delay
Instructions: 23712323 (-0.137747%)
L1 Accesses: 34214482 (+0.069870%)
L2 Accesses: 129691 (+5.098907%)
RAM Accesses: 13849 (+0.065029%)
Estimated Cycles: 35347652 (+0.157724%)

@orottier
Copy link
Owner

L2 access seems to be improved with b85de76
In that vein I opened #208 to explore further optimizations

let in_cycle = self.in_cycle.load(Ordering::SeqCst);

let latest_frame_written = self.latest_frame_written.load(Ordering::SeqCst);
let in_cycle = latest_frame_written != scope.current_frame;
Copy link
Collaborator Author

@b-ma b-ma Aug 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not that sure this is always right when in cycle actually:

  • if not in cycle the order of rendering is guaranteed, therefore we can trust this check
  • but when in cycle, the order of rendering between Reader and Writer is not guaranteed anymore (at least it wasn't before, did it changed somehow?), so the check should fail sometimes?

I actually didn't manage to make a test crash and I see your sort_cycle_breaker test, but it feels a bit weird to me, I don't understand what could have change here (and I'm pretty sure I saw this behaviour in the past, that's why there was this loop in node::delay::tests::test_node_stays_alive_long_enough)

Copy link
Owner

@orottier orottier Aug 14, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm right, it is quite nuanced actually and my code needs to be improved

  • When not in a cycle, there is a edge from writer to reader so writer always renders first
  • When in a cycle, that edge is dropped. The nature of the cycle means output of the reader feeds into the writer, hence, the reader always renders first
  • However, when the user breaks the cycle manually by dropping some other connections, the order is no longer guaranteed and thus random

The last case is a bit of an edge case of course, but I could make the in_cycle property sticky (once you are in a cycle, treat as if you will always stay in that cycle) so we prevent erratic behaviour in rendering of sub-quantum delays.

Another issue I realize is when the Writer is dropped, it will stop updating the latest_frame_written. This causes the Reader to set in_cycle true, disallowing sub-quantum delays. If the delay is set to 100 samples this behaviour is not right becase the final 28 frames om the last Writer call should still render directly in the first 28 frames of the next Reader output.

Pfft, not simple stuff, I will iterate again

@b-ma
Copy link
Collaborator Author

b-ma commented Aug 14, 2022

Unfortunately I already concluded the optimistic graph sorting is incorrect at b1498a2
This still has the truncate and continue style, but the cycle integration test consistently fails

oh ok, I didn't see the sub comment in the commit message, so pessimistic we must be indeed!

L2 access seems to be improved with b85de76
In that vein I opened #208 to explore further optimizations

Nice one! (perf is really a trap :)

From discussion at orottier#198

Another issue I realize is when the Writer is dropped, it will stop
updating the latest_frame_written. This causes the Reader to set
in_cycle true, disallowing sub-quantum delays. If the delay is set to
100 samples this behaviour is not right becase the final 28 frames om
the last Writer call should still render directly in the first 28 frames
of the next Reader output.
Once you are in a cycle, treat as if you will always stay in that
cycle, so we prevent erratic behaviour in rendering of sub-quantum
delays.
@github-actions
Copy link

Benchmark result:

bench_ctor
Instructions: 862548 (-0.432532%)
L1 Accesses: 1745591 (-0.278611%)
L2 Accesses: 7845 (-0.025487%)
RAM Accesses: 10354 (+0.174149%)
Estimated Cycles: 2147206 (-0.197865%)

bench_sine
Instructions: 9350215 (-0.084579%)
L1 Accesses: 14182275 (-0.067426%)
L2 Accesses: 30526 (+0.315478%)
RAM Accesses: 12495 (+0.120192%)
Estimated Cycles: 14772230 (-0.057940%)

bench_sine_gain
Instructions: 10232730 (-0.106817%)
L1 Accesses: 15473990 (-0.084477%)
L2 Accesses: 38711 (+0.454121%)
RAM Accesses: 12647 (+0.102897%)
Estimated Cycles: 16110190 (-0.072901%)

bench_sine_gain_delay
Instructions: 23656078 (+0.309289%)
L1 Accesses: 34120555 (+0.677087%)
L2 Accesses: 130593 (+6.730251%)
RAM Accesses: 13886 (+0.086493%)
Estimated Cycles: 35259530 (+0.774733%)

@@ -186,7 +186,8 @@ impl AudioProcessor for ConstantSourceRenderer {
current_time += dt;
}

true
// tail_time false when output has ended this quantum
stop_time > next_block_time
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remembered that we had to output one channel of silence before returning false, but can't find it in the spec actually... I think the check line 165 makes no sense anymore then?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

handled in e088c95

@github-actions
Copy link

Benchmark result:

bench_ctor
Instructions: 862548 (-0.432532%)
L1 Accesses: 1745595 (-0.278212%)
L2 Accesses: 7843 (-0.089172%)
RAM Accesses: 10352 (+0.154799%)
Estimated Cycles: 2147130 (-0.201954%)

bench_sine
Instructions: 9350215 (-0.084579%)
L1 Accesses: 14181997 (-0.068674%)
L2 Accesses: 30804 (+0.887564%)
RAM Accesses: 12495 (+0.136240%)
Estimated Cycles: 14773342 (-0.052743%)

bench_sine_gain
Instructions: 10232730 (-0.106817%)
L1 Accesses: 15473720 (-0.084827%)
L2 Accesses: 38983 (+0.590907%)
RAM Accesses: 12645 (+0.102913%)
Estimated Cycles: 16111210 (-0.071557%)

bench_sine_gain_delay
Instructions: 23656078 (+0.309289%)
L1 Accesses: 34121108 (+0.688026%)
L2 Accesses: 130041 (+3.624107%)
RAM Accesses: 13885 (+0.093714%)
Estimated Cycles: 35257288 (+0.732418%)

@github-actions
Copy link

Benchmark result:

bench_ctor
Instructions: 862548 (-0.432532%)
L1 Accesses: 1745594 (-0.278041%)
L2 Accesses: 7844 (-0.127324%)
RAM Accesses: 10352 (+0.154799%)
Estimated Cycles: 2147134 (-0.202511%)

bench_sine
Instructions: 9350215 (-0.084579%)
L1 Accesses: 14182097 (-0.067434%)
L2 Accesses: 30704 (+0.310366%)
RAM Accesses: 12495 (+0.136240%)
Estimated Cycles: 14772942 (-0.057505%)

bench_sine_gain
Instructions: 10232730 (-0.106817%)
L1 Accesses: 15473727 (-0.083794%)
L2 Accesses: 38974 (+0.172206%)
RAM Accesses: 12647 (+0.118746%)
Estimated Cycles: 16111242 (-0.075152%)

bench_sine_gain_delay
Instructions: 23651203 (+0.267890%)
L1 Accesses: 34111993 (+0.633090%)
L2 Accesses: 131281 (+5.935848%)
RAM Accesses: 13885 (+0.086499%)
Estimated Cycles: 35254373 (+0.719379%)

@orottier
Copy link
Owner

Thanks for the update, I think we have done enought in this PR.
I added two more test cases, one remark:

Something like that should result in both 4 muted and 3 breaks cycle: (D)

Because 2 is also part of the cycle with 4, it is muted as well

@orottier orottier merged commit 8e3455f into orottier:main Aug 15, 2022
@b-ma b-ma deleted the feature/zero-delay branch November 4, 2023 06:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants