Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix potential race in Spawn#getLocalResources() #24985

Closed
wants to merge 1 commit into from

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented Jan 20, 2025

Both implementations performed two reads of the nullable instance variable, which could be reordered so that the first read returns a non-null value and the second read returns null.

@fmeum fmeum requested a review from katre January 20, 2025 12:29
@fmeum fmeum marked this pull request as ready for review January 20, 2025 12:29
@github-actions github-actions bot added team-Performance Issues for Performance teams awaiting-review PR is awaiting review from an assigned reviewer labels Jan 20, 2025
@fmeum fmeum requested review from justinhorvitz and removed request for katre January 20, 2025 12:29
@fmeum
Copy link
Collaborator Author

fmeum commented Jan 20, 2025

CI failure appears to be unrelated.

Both implementations performed two reads of the nullable instance variable, which could be reordered so that the first read returns a non-null value and the second read returns null.
@meisterT
Copy link
Member

I am fine with the code change, but do not follow how a race can happen here. Can you please elaborate?

@justinhorvitz
Copy link
Contributor

I am fine with the code change, but do not follow how a race can happen here. Can you please elaborate?

Agree, I don't think reordering in manner suggested is possible.

@fmeum
Copy link
Collaborator Author

fmeum commented Jan 21, 2025

I am fine with the code change, but do not follow how a race can happen here. Can you please elaborate?

I may be missing something, so please doublecheck: Looking at

  private ResourceSet localResourcesCached;

  ...

  public ResourceSet getLocalResources() throws ExecException {
    if (localResourcesCached == null) {
      // Not expected to be called concurrently, and an idempotent computation if it is.
      localResourcesCached = localResourcesSupplier.get();
    }
    return localResourcesCached;
  }

the local variable isn't volatile, so writes to it (and thus reads from it) are not ordered. The getLocalResources function reads from it two times (first line and last line) and could get back null both times, following this example. I can't get this to happen on my Apple Silicon MacBook in JCStress, but I don't think it's safe as per the JMM.

@justinhorvitz
Copy link
Contributor

I'm not really sure what that article is trying to say. Maybe it's just showing examples that technically aren't guaranteed to behave a certain way per the specification of the java memory model, even if in practice they do. There's no concern that this code will ever give the wrong result, but it's fine to proceed with this change.

@cushon
Copy link
Contributor

cushon commented Jan 21, 2025

I think the original code is the same as the buggy double-checked-locking implementation discussed here, except without the locking? This might be OK if it doesn't need to be thread-safe, but the implementation comment doesn't seem very confident of that:

Not expected to be called concurrently, and an idempotent computation if it is.

@coeuvre
Copy link
Member

coeuvre commented Jan 21, 2025

I think Spawn is not supposed to be thread-safe: it is assigned to only one thread for execution. I checked the call sites of getLocalResources and there are not concurrent calls.

@fmeum
Copy link
Collaborator Author

fmeum commented Jan 21, 2025

I think Spawn is not supposed to be thread-safe: it is assigned to only one thread for execution. I checked the call sites of getLocalResources and there are not concurrent calls.

Couldn't this end up being called concurrently with dynamic execution and --dynamic_remote_strategy set to a non-remote strategy? It's only a race in theory to begin with and this particular option being set to something other than remote probably also happens only very rarely, but it's not statically obvious to me that this isn't called concurrently.

@coeuvre
Copy link
Member

coeuvre commented Jan 21, 2025

Couldn't this end up being called concurrently with dynamic execution and --dynamic_remote_strategy set to a non-remote strategy?

In practice, only local branch calls this method -- it's for local execution anyway.

@meisterT meisterT added awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally and removed awaiting-review PR is awaiting review from an assigned reviewer labels Jan 24, 2025
@github-actions github-actions bot removed the awaiting-PR-merge PR has been approved by a reviewer and is ready to be merge internally label Jan 24, 2025
@fmeum fmeum deleted the fix-resource-set branch January 25, 2025 16:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team-Performance Issues for Performance teams
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants