Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: lock shared cache directory #1888

Merged
merged 4 commits into from
Sep 13, 2024
Merged

fix: lock shared cache directory #1888

merged 4 commits into from
Sep 13, 2024

Conversation

lengau
Copy link
Collaborator

@lengau lengau commented Sep 9, 2024

Locks the shared cache directory to prevent concurrency issues.

Fixes #1845

CRAFT-3313

@lengau lengau force-pushed the work/1845/no-parallel branch 7 times, most recently from 16a5f34 to 2d9d3df Compare September 9, 2024 22:42
Locks the shared cache directory to prevent concurrency issues.

Fixes #1845

CRAFT-3313
@lengau lengau marked this pull request as ready for review September 11, 2024 13:58
fs: pyfakefs.fake_filesystem.FakeFilesystem, fake_path, simple_charm
) -> services.CharmcraftServiceFactory:
fake_project_dir = fake_path / "project"
def service_factory(simple_charm, new_path) -> services.CharmcraftServiceFactory:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This stops using the fake filesystem and uses the real FS. I think it was a mistake for me to use pyfakefs for integration tests.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes are to deal with the change from using pyfakefs to using a real path.

.github/workflows/tests.yaml Outdated Show resolved Hide resolved
}

with provider.instance(**provider_kwargs) as instance:
# Because we've already locked the cache, we shouldn't see the lockfile.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused here - why isn't the cache visible? Did the call to _maybe_lock_cache() inside provider.instance() fall into the except OSError: codepath?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, because we run _maybe_lock_cache on line 36. Looking back I realise I named the function for the success case and tested for the failure case. I'll fix that

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is surprising to me because I thought flock() was per-process, so a process trying to flock a file it already holds would be a noop
If this isn't the case, what happens if we do multiple consecutive managed builds? Like a build plan with multiple entries

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right! I wasn't checking for that properly, either. However, it's not quite per-process either. It's per file descriptor, so the same process can't lock the same path in two places if it's got the file open separately twice. (This also requires making two separate Path objects, as pathlib will cache the file descriptor for the same object, as I learnt the hard way.)

I've updated it now with a much better lock and test.

Copy link
Contributor

@mattculler mattculler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. If this is an issue that users see frequently then a nicer solution might be more fine-grained cache locking control, so multiple instances of charmcraft could run with the shared cache at once, but they couldn't both use the same package (or whatever the things inside this cache are) at the same time.

@lengau lengau requested a review from tigarmo September 12, 2024 17:56
@lengau lengau force-pushed the work/1845/no-parallel branch from 5b6a9f3 to 7e134dd Compare September 12, 2024 18:03
@lengau lengau force-pushed the work/1845/no-parallel branch 3 times, most recently from 538fa94 to 325e741 Compare September 13, 2024 21:41
@lengau lengau merged commit 4cce11b into hotfix/3.2 Sep 13, 2024
18 checks passed
@lengau lengau deleted the work/1845/no-parallel branch September 13, 2024 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants