-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rust: Replace keep_fingerprints
logic with unconditional cargo clean
#63
base: main
Are you sure you want to change the base?
Conversation
The downside of this change is that local crates must always be fully rebuilt. ie. caching is now only effective for 3rd party dependencies. |
Interesting idea Ian. This is related to https://github.com/ijc/earthly-lib/tree/rust-avoid-caching-local-path-dependencies/rust#keep_fingerprints-false. |
Hrm, yes, there is some overlap for sure. My take would be that it is better to use formal APIs ( I would expect this also produces a space saving since I think removing the fingerprint doesn't remove the actual build artifacts, they will never be reused due to the lack of fingerprint so it is wasteful to leave them in the cache? Whereas clean will remove the actual binary blobs too. |
rust/Earthfile
Outdated
cargo $args; \ | ||
if [ "$EARTHLY_DO_NOT_CACHE_PATH_BASED_DEPENDENCIES" = "true" ]; then \ | ||
cargo metadata --format-version=1 --no-deps | jq -r '.packages[] | select(.id | startswith("path+file://")) | .name' | xargs -I{} cargo clean -p {}; \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is too complicated and doesn't match "path" deps like it claims, it's just matching all local crates in an complicated way (since it is matching the crate location not something to do with the dep tree). It could be replaced with .packages[].name
which is exactly the same due to --no-deps
.
I think the path based dependencies thing is a red-herring, since we'd want to not cache any local crate since even the crate being built could suffer the false target cache hit problem, not only the things it depends on.
If the principal here is one we want to go fortward with then I'll update to simplify and rename/reword things.
919b55d
to
8362c22
Compare
Agree
What if we remove the fingerprints option and we always avoid caching path-based deps? |
I think that would be fine. I'd probably leave an option to disable the clean still, but have it do the clean by default. I'm not sure, but I think with this the |
I don't see a reason anymore neither |
Actually, in trying to write a doc update as part of this I realised I couldn't really put into succinct words the circumstances where this option would be useful -- it's a pretty subtle case and my suspicion is that any attempt to use it would more than likely be wrong. Maybe I'll just make it unconditional. |
This avoids producing: ``` +lint | no files found within ./target matching the provided output regexp +lint | find: '/tmp/earthly/lib/rust': No such file or directory ``` when there is no output, because `copy-output.sh` only creates that path `if [ -n \"\$1\" ]`. Apply the same condition to the `rename-output.sh` end of things.
The logs otherwise just contain: ``` --> RUN set -e; cargo $args; cargo sweep -r -t $EARTHLY_SWEEP_DAYS; cargo sweep -r -i; $EARTHLY_FUNCTIONS_HOME/copy-output.sh "$output"; ``` which doesn't show what is actually being run.
Due to Earthly's layer cache code added with `COPY` (even with `--keep-ts`) can end up with timestamps (`mtime`) corresponding to the point of creation of the cache entry, not the current time. However on a following build the `target` mount cache may contain builds from other branches, with different code for those dependencies, which have a newer `mtime`. In this case `cargo` will think it can use the cached dependency instead of rebuilding (because the code appears older than the cached entry under `target`). Avoid this by using `cargo clean` to remove the build artifacts for any local crate. This should become unnecessary with rust-lang/cargo#14136 This replaces the old behaviour of removing the fingerprints directory. Using `cargo clean` uses a proper cargo API rather than relying on implementation details like where the fingerprints live and what the consequence removing them is. It may also keep the cached data smaller since it removes the build artifacts which will likely never be reused due to the lack of fingerprint. Note that the previous fingerprint cleaning was subject to a race where a different parallel build could reintroduce some fingerprints between `DO +REMOVE_SOURCE_FINGERPRINTS` and the `RUN ... cargo $args`. For that reason the calls to `cargo clean` here are made within the same `RUN` command so that the target cache remains locked. By switching to `cargo metadata` the requirement for `tomljson` is removed.
8362c22
to
d24d25c
Compare
keep_fingerprints
logic with unconditional cargo clean
echo "+CARGO: copying output"; \ | ||
$EARTHLY_FUNCTIONS_HOME/copy-output.sh "$output"; \ | ||
echo "+CARGO: removing local crates from target cache"; \ | ||
cargo metadata --format-version=1 --no-deps | /tmp/jq -r '.packages[].name' | xargs -I{} cargo clean -p {}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The call to cargo clean
needs to match the profile used by cargo $args
e.g. needs to include --release
or --profile=foo
as needed. Or I need to figure out a way to say "all profiles".
I'll have a think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't find a way to get a list of interesting profiles to clean. Other than "ask the user to supply it" or something ugly involving inspecting $args
I can't currently see a way to make this approach work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is further complicated by tools such as cargo llvm-cov
which overrides target dir (to target/llvm-cov-target/
).
Due to Earthly's layer cache code added with
COPY
(even with--keep-ts
) canend up with timestamps (
mtime
) corresponding to the point of creation of thecache entry, not the current time.
However on a following build the
target
mount cache may contain builds fromother branches, with different code for those dependencies, which have a newer
mtime
. In this casecargo
will think it can use the cached dependencyinstead of rebuilding (because the code appears older than the cached entry
under
target
).Avoid this by using
cargo clean
to remove the build artifacts for any localcrate. This should become unnecessary with rust-lang/cargo#14136
This replaces the old behaviour of removing the fingerprints directory. Using
cargo clean
uses a proper cargo API rather than relying on implementationdetails like where the fingerprints live and what the consequence removing them
is. It may also keep the cached data smaller since it removes the build
artifacts which will likely never be reused due to the lack of fingerprint.
Note that the previous fingerprint cleaning was subject to a race where a
different parallel build could reintroduce some fingerprints between
DO +REMOVE_SOURCE_FINGERPRINTS
and theRUN ... cargo $args
. For that reason thecalls to
cargo clean
here are made within the sameRUN
command so that thetarget cache remains locked.
By switching to
cargo metadata
the requirement fortomljson
is removed.This is an alternative to #62.
I also included a couple of QOL changes which I think are useful.