Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upstream inconsistent graph state fix #58

Closed
wants to merge 7 commits into from

Conversation

mikejholly
Copy link
Contributor

@mikejholly mikejholly commented May 2, 2024

Selected commits from: moby/buildkit#4887 & dependent changes.

tonistiigi and others added 3 commits May 2, 2024 12:53
When different LLB vertexes (eg. parallel requests referencing
local sources from different sessions) generate same cache keys
during solve they are merged together into a single operation.

Currently, when this happened the progress for the vertex that
was dropped got lost. This fixes this case by adding the
progressWriter of the redirected vertex as a target to the
source one.

This should also work with multiple levels of merged edges,
just multiple nested multiwriters as well.

Signed-off-by: Tonis Tiigi <[email protected]>
Before this, it was possible for an edge merge to happen to a target
edge/state that is no longer in the actives map, e.g.
1. Job A solves some edges
2. Job B solves some edges that get merged with job A's edges
3. Job A is discarded
4. Job C solves some edges that get merged with job B's edges, which
   recursively end up merging to job A's, which no longer exist in the
   actives map.

While this doesn't always result in an error, given the right state and
order of operations this can result in `getState` or `getEdge` being
called on the "stale" edges that are inactive and an error. E.g. if a
stale dep transitions from desired state `cache-fast` to `cache-slow`,
the slow cache logic will call `getState` on it and return a `compute
cache` error.

I also *suspect* this is the same root cause behind `inconsistent graph
state` errors still seen occasionally, but have not repro'd that error
locally so can't be 100% sure yet.

The fix here updates `state.setEdge` to register all the jobs in the
source edge's state with the target edge's state. This works out
because:
1. edges that are the target of a merge will not have their state
   removed from the actives map if only the original job creating them
   is discarded
1. those edges are still removed eventually, but only when all jobs
   referencing them (now including jobs referencing them via edge
   merges) are discarded

Signed-off-by: Erik Sipsma <[email protected]>
@mikejholly mikejholly changed the title solver: fix printing progress messages after merged edges Upstream inconsistent graph state fix May 2, 2024
@mikejholly mikejholly closed this May 17, 2024
@mikejholly mikejholly deleted the mh/solver-state-fix branch May 17, 2024 04:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants