Only send flushes when Downstairs is idle; send Barrier otherwise #1505

mkeeter · 2024-10-14T17:54:26Z

This PR removes automatic flushes, per RFD 518. Instead, the new Barrier operation is sent. If the system is idle for a particular amount of time, we send a final Flush to put everything into a known state.

When the Upstairs retires jobs after a barrier operation, the system as a whole becomes ineligible for replay. This state determines whether the new Downstairs reconnects through Offline (which does replay) or Faulted (which does live-repair instead).

Removing automatic flushes is a noticeable performance improvement:

(tested on the London mini-cluster, with Upstairs and 3x Downstairs on different sleds)

region info:
  block size:      4096 bytes
  blocks / extent: 16384
  extent size:     64 MiB
  extent count:    2048
  total blocks:    33554432
  total size:      128 GiB
  encryption:      yes

mkeeter · 2024-10-15T20:41:06Z

Rebased to stage on top of #1507, because we want to only send Barrier operations if there's enough bytes / jobs buffered (which requires keeping track!)

upstairs/src/downstairs.rs

leftwo · 2024-10-28T22:38:36Z

upstairs/src/dummy_downstairs_tests.rs

+                panic!("expected Barrier, got message {m:?}");
+            }
+            harness.ds2.ack_barrier().await;
+            harness.ds3.ack_barrier().await;


Do we have any way to check that replay is no longer available to this downstairs? I'm not sure if the test has access at this layer to it.

The end of this unit test is checking that live-repair works, so I think we're implicitly testing that we don't do replay.

Ah, I was thinking more that the sending of a barrier operation has also flipped the can_replay in the Downstairs struct. I'm not sure if we can do that though here (or in a test elsewhere), as can_replay might not be exposed like that.

Yeah, it's not easy to check from the outside (when we only have the Guest handle). If you really want it, we could probably add it to the downstairs_state helper function.

I don't think it's worth adding it to the downstairs_state helper function.
I don't see anywhere else that seems like a good place either. Nothing in the upstairs tests cover this case.

leftwo · 2024-10-29T20:20:14Z

With this, we can probably close #1358 as fixed

faithanalog

We should probably tighten up IO_CACHED_MAX_BYTES soon. Right now we can theoretically buffer up to 1GiB of jobs for replay, and that feels like quite a lot to me. We usually won't- we will only get to that point if the guest is doing large amounts of IO, continuously so we never go idle, and the guest never sends a flush. That's unlikely/rare during normal filesystem operations (but will be easier to hit when writing to the raw block device like iodriver does). Because of that, we shouldn't expect more than a few VMs to hit this under normal operation. But in interest in not overcommitting resources, I think that bound should be lower.

mkeeter requested review from jmpesp and leftwo October 14, 2024 17:54

mkeeter force-pushed the mkeeter/no-auto-flush branch 2 times, most recently from b0c092b to 3a7e2f8 Compare October 15, 2024 20:40

mkeeter changed the base branch from main to mkeeter/io-state-job-and-byte-count October 15, 2024 20:40

mkeeter force-pushed the mkeeter/io-state-job-and-byte-count branch from 7b7ec22 to 1e4cc53 Compare October 16, 2024 13:48

mkeeter force-pushed the mkeeter/no-auto-flush branch from d3e9973 to 5f94966 Compare October 16, 2024 13:48

Base automatically changed from mkeeter/io-state-job-and-byte-count to main October 16, 2024 14:20

mkeeter force-pushed the mkeeter/no-auto-flush branch 2 times, most recently from 796a397 to 2d9dba3 Compare October 17, 2024 21:32

mkeeter force-pushed the mkeeter/no-auto-flush branch from 2d9dba3 to 2709712 Compare October 28, 2024 12:37

leftwo reviewed Oct 28, 2024

View reviewed changes

mkeeter force-pushed the mkeeter/no-auto-flush branch 2 times, most recently from 5cc9448 to be0f66d Compare October 30, 2024 19:50

leftwo approved these changes Oct 30, 2024

View reviewed changes

mkeeter force-pushed the mkeeter/no-auto-flush branch from be0f66d to bcd4ba2 Compare October 31, 2024 15:33

faithanalog approved these changes Oct 31, 2024

View reviewed changes

Only send flushes when Downstairs is idle; send Barrier otherwise

bcad542

mkeeter force-pushed the mkeeter/no-auto-flush branch from bcd4ba2 to bcad542 Compare November 1, 2024 19:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only send flushes when Downstairs is idle; send Barrier otherwise #1505

Only send flushes when Downstairs is idle; send Barrier otherwise #1505

mkeeter commented Oct 14, 2024 •

edited

Loading

mkeeter commented Oct 15, 2024

leftwo Oct 28, 2024

mkeeter Oct 29, 2024

leftwo Oct 29, 2024

mkeeter Oct 30, 2024

leftwo Oct 30, 2024

leftwo commented Oct 29, 2024

faithanalog left a comment

Only send flushes when Downstairs is idle; send Barrier otherwise #1505

Are you sure you want to change the base?

Only send flushes when Downstairs is idle; send Barrier otherwise #1505

Conversation

mkeeter commented Oct 14, 2024 • edited Loading

mkeeter commented Oct 15, 2024

leftwo Oct 28, 2024

Choose a reason for hiding this comment

mkeeter Oct 29, 2024

Choose a reason for hiding this comment

leftwo Oct 29, 2024

Choose a reason for hiding this comment

mkeeter Oct 30, 2024

Choose a reason for hiding this comment

leftwo Oct 30, 2024

Choose a reason for hiding this comment

leftwo commented Oct 29, 2024

faithanalog left a comment

Choose a reason for hiding this comment

mkeeter commented Oct 14, 2024 •

edited

Loading