Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

allocator: make assignment much much faster #351

Merged
merged 2 commits into from
Oct 12, 2023
Merged

Conversation

jgraettinger
Copy link
Contributor

@jgraettinger jgraettinger commented Oct 6, 2023

When used with a large number of journals, certain graph constructions
can be quite slow, particularly during member scale down.

First update benchmark_test.go to separately model scale-up and
scale-down phases of a simulated rolling deployment, which replicates
the conditions we've observed.

Next implement the label "gap" heuristic for the push/relabel algorithm
by tracking the number of nodes at each height < len(nodes), and looking
for zero-ings of counts as an indication that a gap has been created.
Respond to gaps by instantly re-labeling nodes to a larger height that
reconnects them to the network.

On my machine, with current (toy) parameters the original algorithm had
a max/flow round which required >4.5s to completed. Now, the same
network completes in 27ms. This scales up combinatorially with graph
complexity.

Update some tests to use "testing" instead of go-check. Excepting where
tests have been extended, all scenarios tests are essentially unchanged.

Finally, bound the number of items that participate in a single max-flow
network, and compose a final desired assignment solution by solving multiple
independent max flow problems, with disjoint subsets of items and all
members.

This makes the big-O runtime of assignment linear on the number of
items, which generally dominates over members.

Testing:

  • Refinements of scenario tests
  • A lot of local testing using the benchmark included in the package. I locally tuned and scaled up the parameters of the benchmark to ramp up the size of the problem, scaling up to a simulated rolling deployment involving 600 members, 1MM items and 3MM assignments, which completed successfully. At this scale, the bottleneck is Etcd itself.

This change is Reviewable

When used with a large number of journals, certain graph constructions
can be quite slow, particularly during member scale down.

First update `benchmark_test.go` to separately model scale-up and
scale-down phases of a simulated rolling deployment, which replicates
the conditions we've observed.

Next implement the label "gap" heuristic for the push/relabel algorithm
by tracking the number of nodes at each height < len(nodes), and looking
for zero-ings of counts as an indication that a gap has been created.
Respond to gaps by instantly re-labeling nodes to a larger height that
reconnects them to the network.

On my machine, with current (toy) parameters the original algorithm had
a max/flow round which required >4.5s to completed. Now, the same
network completes in 27ms. This scales up combinatorially with graph
complexity.

Update some tests to use "testing" instead of go-check. Excepting where
tests have been extended, all scenarios tests are essentially unchanged.

Finally, bound the number of items that participate in a single max-flow
network, and compose a final desired assignment solution by solving multiple
independent max flow problems, with disjoint subsets of items and all
members.

This makes the big-O runtime of assignment linear on the number of
items, which generally dominates over members.

Tested up to 1MM items / 3MM assignments and 600 members.
To make CI run a little faster.
Copy link
Contributor

@psFried psFried left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was interesting, and required a few trips to the push-relabel wikipedia page before I felt like I understood what was going on here. I'm sure that I still don't fully understand all of this, and I left a few questions just to make sure I'm not completely lost.

The gap heuristic makes sense, though, and so does the splitting of the problem into smaller groups of items. And FWIW I didn't notice anything that seemed wrong.

// instead of combinatorial over members and items.
const itemsPerNetwork = 10000

// NOTE(johnny): this could trivially be parallelized if needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for my own understanding: I don't understand is how this could be trivially parallelized while still respecting the item limit for each member. Wouldn't each process need to know the current number of assignments for each member in order to be able to respect their item limits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Answered below.

Then re: parallelization, State is unchanged when building / evaluating the max flow network, so multiple goroutines can work in parallel. I didn't do it yet because it's extra SLOC we don't appear to need, and I didn't know for-sure that the memory impact would be negligible.

func (fs *sparseFlowNetwork) buildMemberArc(mf *pr.MaxFlow, id pr.NodeID, member int) []pr.Arc {
var c = memberAt(fs.Members, member).ItemLimit()
// Constrain to the scaled ItemLimit for our portion of the global assignment problem.
c = scaleAndRound(c, len(fs.myItems), len(fs.Items))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another check for understanding: I think this line addresses my previous question about respecting item limits when treating this as several smaller networks, yeah?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. Each sub-problem gets a corresponding fraction of the total member capacity.

@jgraettinger jgraettinger merged commit dfed675 into master Oct 12, 2023
1 check passed
@jgraettinger jgraettinger deleted the johnny/faster-alloc branch October 12, 2023 13:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants