Skip to content

Commit

Permalink
Adds +INIT and $OS_RELEASE to cache id (#25)
Browse files Browse the repository at this point in the history
This PR:
- Includes the distro information (from `/etc/os-release`) in the cache
id default value. This avoids the scenario of reusing the cache entries
across potentially incompatibles shared libraries.
- Includes a `+INIT` UDC used for initialize the build environment for
the rest UDCs to be invoked.
- Exports the cache id to the build environment as
`$CARGO_HOME_CACHE_ID`

---------

Co-authored-by: Vlad A. Ionescu <[email protected]>
  • Loading branch information
idelvall and vladaionescu authored Nov 15, 2023
1 parent 489a6a4 commit 1135c5c
Show file tree
Hide file tree
Showing 2 changed files with 146 additions and 80 deletions.
139 changes: 84 additions & 55 deletions rust/Earthfile
Original file line number Diff line number Diff line change
@@ -1,24 +1,51 @@
VERSION --global-cache 0.7

# INIT stores the configuration required for the other UDCs in the filesystem, and installs required dependencies.
# - cache_id: Overrides default ID of the global $CARGO_HOME cache. Its value is exported to the build environment under the entry: $CARGO_HOME_CACHE_ID
# - keep_fingerprints (false): Instructs the following +CARGO calls to don't remove the Cargo fingerprints of the source packages. Use only when source packages have been COPYed with --keep-ts option.
# - sweep_days (4): +CARGO calls use cargo-sweep to clean build artifacts that haven't been accessed for this number of days.
INIT:
COMMAND
RUN if [ -f /earthly/cfg/cache_id ]; then \
echo "+INIT has already been called in this build environment" ; \
exit 1; \
fi
DO +INSTALL_CARGO_SWEEP
RUN mkdir -p /earthly/cfg

# cache_id
ARG EARTHLY_TARGET_PROJECT_NO_TAG
ARG OS_RELEASE=$(md5sum /etc/os-release | cut -d ' ' -f 1)
ARG cache_id="${EARTHLY_TARGET_PROJECT_NO_TAG}#${OS_RELEASE}#earthly-cargo-cache"
RUN echo "$cache_id">/earthly/cfg/cache_id
ENV CARGO_HOME_CACHE_ID=$cache_id

#keep_fingerprints
ARG keep_fingerprints=false
RUN echo "$keep_fingerprints">/earthly/cfg/keep_fingerprints

#sweep_days
ARG sweep_days=4
RUN echo "$sweep_days">/earthly/cfg/sweep_days

# CARGO runs the cargo command "cargo $args".
# This UDC should be thread safe. Parallel builds of targets using it should be free of race conditions.
# This UDC is thread safe. Parallel builds of targets calling this UDC should be free of race conditions.
# Notice that in order to run this UDC, +INIT must be called first.
# Arguments:
# - args: Cargo subcommand and its arguments. Required.
# - keep_fingerprints (false): Do not remove source packages fingerprints. Use only when source packages have been COPYed with --keep-ts option.
# - sweep_days (4): The UDC uses cargo-sweep to clean build artifacts that haven't been accessed for this number of days.
# - output: Regex to match the files within the target folder to be copied from the cache to the caller filesystem (image layers).
# Use this argument when you want to SAVE an ARTIFACT from the target folder (mounted cache), always trying to minimize the total size of the copied fileset.
# For example --output="release/[^\./]+" would keep all the files in /target/release that don't have any extension.
CARGO:
COMMAND
DO +CHECK_INITED
ARG --required args
ARG keep_fingerprints=false
ARG sweep_days=4
ARG keep_fingerprints=$(cat /earthly/cfg/keep_fingerprints)
ARG sweep_days=$(cat /earthly/cfg/sweep_days)
ARG output
IF [ "$keep_fingerprints" = "false" ]
DO +REMOVE_SOURCE_FINGERPRINTS
END
DO +INSTALL_CARGO_SWEEP
DO +RUN_WITH_CACHE --command="set -e;
echo \"Running cargo $args\" ;
cargo $args;
Expand All @@ -38,34 +65,43 @@ CARGO:
mv /earthly_lib_rust_temp/* target 2>/dev/null || echo "no files found within ./target matching the provided output regexp" ;
END

# REMOVE_SOURCE_FINGERPRINTS removes the Cargo fingerprint folders of the source packages.
# This guarantees Cargo compiles the packages when COPY commands of the source folders have a static timestamp (see --keep-ts).
REMOVE_SOURCE_FINGERPRINTS:
# RUN_WITH_CACHE runs the passed command with the CARGO caches mounted.
# Notice that in order to run this UDC, +INIT must be called first.
# Arguments:
# - command (required): Command to run, can be any expression.
#
RUN_WITH_CACHE:
COMMAND
COPY +get-tomljson/tomljson /tmp/tomljson
COPY +get-jq/jq /tmp/jq
DO +RUN_WITH_CACHE --command="set -e;
source_libs=\$(find . -name Cargo.toml -exec bash -c '/tmp/tomljson {} | /tmp/jq -r .package.name; printf \"\\n\"' \\;) ;
fingerprint_folders=\$(find target -name .fingerprint) ;
echo \"deleting fingerprints:\";
for fingerprint_folder in \$fingerprint_folders; do
cd \$fingerprint_folder;
for source_lib in \$source_libs; do
find . -maxdepth 1 -regex \"\./\$source_lib-[^-]+\" -exec bash -c 'readlink -f {}; rm -rf {}' \; ;
done
done"
DO +CHECK_INITED
ARG --required command
ARG cache_id = $(cat /earthly/cfg/cache_id)
# Save to restore at the end.
ARG ORIGINAL_CARGO_HOME=$CARGO_HOME
ARG ORIGINAL_CARGO_INSTALL_ROOT=$CARGO_INSTALL_ROOT
# Make sure that crates installed though this UDC are stored in the original cargo home, and not in the cargo home within the mount cache.
# This way, if BK garbage-collects them, the build is not broken.
ENV CARGO_INSTALL_ROOT=$ORIGINAL_CARGO_HOME
# We change $CARGO_HOME while keeping $ORIGINAL_CARGO_HOME/bin directory in the path. This way, the Cargo binary is still accessible and the whole $CARGO_HOME is within the global cache
# ($CARGO_HOME/.package-cache has to be in the cache so Cargo can properly synchronize parallel access to $CARGO_HOME resources).
ENV CARGO_HOME="/earthly/.cargo"
RUN --mount=type=cache,mode=0777,id=$cache_id,sharing=shared,target=$CARGO_HOME \
--mount=type=cache,mode=0777,target=target \
set -e; \
mkdir -p $CARGO_HOME; \
printf "Running:\n $command\n"; \
eval $command
ENV CARGO_HOME=$ORIGINAL_CARGO_HOME
ENV CARGO_INSTALL_ROOT=$ORIGINAL_CARGO_INSTALL_ROOT

# get-tomljson gets the portable tomljson binary.
get-tomljson:
FROM alpine:3.18.3
ARG USERARCH
ARG version=2.1.0
RUN wget -O tomljson.tar.xz https://github.com/pelletier/go-toml/releases/download/v${version}/tomljson_${version}_linux_${USERARCH}.tar.xz && \
tar -xf tomljson.tar.xz && \
tar -xf tomljson.tar.xz; \
chmod +x tomljson
SAVE ARTIFACT tomljson

# get-jq gets the portable jq binary.
get-jq:
FROM alpine:3.18.3
ARG USERARCH
Expand All @@ -74,39 +110,32 @@ get-jq:
chmod +x jq
SAVE ARTIFACT jq

# INSTALL_CARGO_SWEEP installs cargo-sweep if it doesn't exist already.
INSTALL_CARGO_SWEEP:
COMMAND
RUN if [ ! -f $CARGO_HOME/bin/cargo-sweep ]; then \
echo "Installing cargo sweep" ; \
cargo install cargo-sweep --root $CARGO_HOME; \
fi;

REMOVE_SOURCE_FINGERPRINTS:
COMMAND
DO +CHECK_INITED
COPY +get-tomljson/tomljson /tmp/tomljson
COPY +get-jq/jq /tmp/jq
DO +RUN_WITH_CACHE --command="set -e;
if [ ! -f \$CARGO_HOME/bin/cargo-sweep ]; then
cargo install cargo-sweep --root \$CARGO_HOME;
fi;"
source_libs=\$(find . -name Cargo.toml -exec bash -c '/tmp/tomljson {} | /tmp/jq -r .package.name; printf \"\\n\"' \\;) ;
fingerprint_folders=\$(find target -name .fingerprint) ;
echo \"deleting fingerprints:\";
for fingerprint_folder in \$fingerprint_folders; do
cd \$fingerprint_folder;
for source_lib in \$source_libs; do
find . -maxdepth 1 -regex \"\./\$source_lib-[^-]+\" -exec bash -c 'readlink -f {}; rm -rf {}' \; ;
done
done"

# RUN_WITH_CACHE runs the passed command with the CARGO caches mounted.
# Arguments:
# - command (required): Command to run, can be any expression.
#
# This implementation is not expected to significantly change. Prefer using the `CARGO` UDC if you can, so you can get future improvements transparently.
RUN_WITH_CACHE:
CHECK_INITED:
COMMAND
ARG --required command
ARG EARTHLY_TARGET_PROJECT_NO_TAG
ARG CACHE_ID="${EARTHLY_TARGET_PROJECT_NO_TAG}#earthly-cargo-cache"
# $ORIGINAL_CARGO_HOME/bin will contain the cargo and rust binaries, as well as the installed crates.
# We save it to reset it back at the end. This location is presumed to be in the calling target layers filesystem rather than in other mount cache.
ARG ORIGINAL_CARGO_HOME=$CARGO_HOME
ARG ORIGINAL_CARGO_INSTALL_ROOT=$CARGO_INSTALL_ROOT
# Make sure that crates installed though this UDC are stored in the original cargo home, and not in the cargo home within the mount cache.
# This way, if BK garbage-collects them, the build is not broken
ENV CARGO_INSTALL_ROOT=$ORIGINAL_CARGO_HOME
# We need $CARGO_HOME/.package-cache within the earthly shared-mount-cache, so cargo can properly synchronize parallel access to $CARGO_HOME resources.
ENV CARGO_HOME="/earthly/.cargo"
RUN echo $CACHE_ID
RUN --mount=type=cache,mode=0777,id=$CACHE_ID,sharing=shared,target=/earthly \
--mount=type=cache,mode=0777,target=target \
set -e; \
mkdir -p $CARGO_HOME; \
printf "Running:\n $command\n"; \
eval $command
ENV CARGO_HOME=$ORIGINAL_CARGO_HOME
ENV CARGO_INSTALL_ROOT=$ORIGINAL_CARGO_INSTALL_ROOT
RUN if [ ! -f /earthly/cfg/cache_id ]; then \
echo "+INIT has not been called yet in this build environment" ; \
exit 1; \
fi;
87 changes: 62 additions & 25 deletions rust/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,40 +2,55 @@

Earthly's official collection of rust [UDCs](https://docs.earthly.dev/docs/guides/udc).

## +CARGO

This UDC runs the cargo command `cargo $args` caching the contents of `$CARGO_HOME/registry`, `$CARGO_HOME/git` and `target` for future builds of the same calling target.

### Usage

First, import the UDC up in your Earthfile:
```earthfile
VERSION 0.7
VERSION --global-cache 0.7
IMPORT github.com/earthly/lib/rust:<version/commit> AS rust
```
> :warning: Due to [this issue](https://github.com/earthly/earthly/issues/3490), make sure to enable `--global-cache` in the calling Earthfile, as shown above.
Then, just use it in your own targets and UDCs:
```earthfile
DO rust+CARGO ...
```
## +INIT

### Thread safety
This UDC should be thread safe. Parallel builds of targets using it should be free of race conditions.
This UDC stores the configuration required by the other UDCs in the build environment filesystem, and installs required dependencies.

### Arguments
It must be called once per build environment, to avoid passing repetitive arguments to the UDCs called after it, and to install required dependencies before the source files are copied from the build context.

#### `args`
Cargo subcommand and its arguments. Required.
### Usage

#### `keep_fingerprints (false)`
Do not remove source packages fingerprints. Use only when source packages have been `COPY`ed with `--keep-ts` option.
Call once per build environment:
```earthfile
DO rust+INIT ...
```

Cargo caches compilations of packages in `target` folder based on their last modification timestamps.
### Arguments
#### `cache_id`
Overrides default ID of the global `$CARGO_HOME` cache. Its value is exported to the build environment under the entry: `$CARGO_HOME_CACHE_ID`.

#### `keep_fingerprints (false)`
Instructs the following `+CARGO` calls to don't remove the Cargo fingerprints of the source packages. Use only when source packages have been COPYed with `--keep-ts `option.
Cargo caches compilations of packages in `target` folder based on their last modification timestamps.
By default, this UDC removes the fingerprints of the packages found in the source code, to force their recompilation and work even when the Earthly `COPY` commands used overwrote the timestamps.

#### `sweep_days (4)`
The UDC uses [cargo-sweep](https://github.com/holmgr/cargo-sweep) to clean build artifacts that haven't been accessed for this number of days.
`+CARGO` calls use cargo-sweep to clean build artifacts that haven't been accessed for this number of days.

## +CARGO

This UDC runs the cargo command `cargo $args` caching the contents of `$CARGO_HOME` and `target` for future builds of the same calling target.

Notice that in order to run this UDC, [+INIT](#init) must be called first.

### Usage

After calling `+INIT`, use it to wrap cargo commands:

```earthfile
DO rust+CARGO ...
```
### Arguments

#### `args`
Cargo subcommand and its arguments. Required.

#### `output`
Regex to match the files within the target folder to be copied from the cache to the caller filesystem (image layers).
Expand All @@ -44,7 +59,27 @@ Use this argument when you want to `SAVE ARTIFACT` from the target folder (mount

For example `--output="release/[^\./]+"` would keep all the files in `/target/release` that don't have any extension.

### Examples:
### Thread safety
This UDC is thread safe. Parallel builds of targets calling this UDC should be free of race conditions.

## +RUN_WITH_CACHE

`+RUN_WITH_CACHE` runs the passed command with the CARGO caches mounted.

Notice that in order to run this UDC, [+INIT](#init) must be called first.

### Arguments
#### `command (required)`
Command to run, can be any expression.

### Example
Show `$CARGO_HOME` cached-entries size:

```earthfile
DO rust-udc+RUN_WITH_CACHE --command "du \$CARGO_HOME"
```

## Complete example

Suppose the following project:
```
Expand All @@ -66,25 +101,27 @@ Suppose the following project:
The Earthfile would look like:

```earthfile
VERSION 0.7
ARG --global debian=bookworm
VERSION --global-cache 0.7
# Importing UDC definition from default branch (in a real case, specify version or commit to guarantee immutability)
IMPORT github.com/earthly/lib/rust AS rust
install:
FROM rust:1.73.0-$debian
FROM rust:1.73.0-bookworm
RUN apt-get update -qq
RUN apt-get install --no-install-recommends -qq autoconf autotools-dev libtool-bin clang cmake bsdmainutils
RUN cargo install --locked cargo-deny
RUN rustup component add clippy
RUN rustup component add rustfmt
# Call +INIT before copying the source file to avoid installing depencies every time source code changes.
# This parametrization will be used in future calls to UDCs of the library
DO rust+INIT --keep_fingerprints=true
source:
FROM +install
COPY --keep-ts Cargo.toml Cargo.lock ./
COPY --keep-ts deny.toml ./
COPY --dir package1 package2 ./
COPY --keep-ts --dir package1 package2 ./
# build builds with the Cargo release profile
build:
Expand Down

0 comments on commit 1135c5c

Please sign in to comment.