Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

revisit flush_timeout #1358

Open
faithanalog opened this issue Jun 17, 2024 · 1 comment
Open

revisit flush_timeout #1358

faithanalog opened this issue Jun 17, 2024 · 1 comment

Comments

@faithanalog
Copy link
Contributor

faithanalog commented Jun 17, 2024

Background:

Upstairs periodically sends flush commands to Downstairs, even if the guest is not asking for disk flushes. Originally this was every 5 seconds. I believe, though am not sure, that it was added to prevent jobs piling up in the upstairs queue.

We adjusted this to 0.5 seconds back when we were still using the sqlite:

let flush_timeout = timeout.unwrap_or(0.5);

We adjusted to 0.5 because, at the time, the cost of an extent flush required a lot of work clearing old metadata contexts out of the sqlite database. That work scaled up directly with the number of writes that had hit an extent since the last flush, and was causing some pretty terrible latency bubbles that we wanted to avoid. Bryan did some testing, found 0.5 gave the best results ( #757 ). Note that this was also pre fast-write-ack.

And it remains 0.5 to this day:

let flush_timeout_secs = opt.flush_timeout.unwrap_or(0.5);

This is configurable in the VCR, but Nexus is written to always pass None and accept our default.

Why we might want to change it

Well for one thing, we are not on the sqlite backend anymore, and our new one has different flush performance characteristics. But also, we may be sending more fsyncs to zfs than the guest actually cares about. That has a cost to it.

Questions

  • Does adjusting the flush timeout affect performance or CPU usage?
  • If it is also still serving the role of preventing job pileup, is it still the right mechanism for that?
@leftwo
Copy link
Contributor

leftwo commented Jun 21, 2024

The flush timeout does still play a role in preventing job pileup.
For replay to work, we keep a list of all jobs since the last ack'd flush, and if a
downstairs goes away and comes back, we replay all the work since that last confirmed flush.
The frequent flush allows us to discard any work and not have to keep that work in memory.

So, we would need to determine a new way of allowing replay to work correctly if we want
to increase the flush timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants