diff --git a/docs/adr/0001-add-generic-fetcher.md b/docs/adr/0001-add-generic-fetcher.md new file mode 100644 index 000000000..aceffc8f3 --- /dev/null +++ b/docs/adr/0001-add-generic-fetcher.md @@ -0,0 +1,109 @@ +# Add generic fetcher + +- Status: proposed +- Date: 2024-10-30 + +## Context + +The main motivation for this change is to cover use cases of users that need to download arbitrary files that don't fit +within an established package ecosystem cachi2 could potentially otherwise support. The target audience is users that +want to use cachi2 to achieve hermetic builds and want an easy way to also include these arbitrary files, that cachi2 +will account for in the SBOM it produces. + +## Decision + +This change introduces a generic fetcher, an additional cachi2 package manager. This package manager utilizes a custom +lockfile that is located in the input repository. Based on that lockfile, it will download files, save them into a requested +location, and verify checksums. Below is a more detailed overview of the implementation. + +### Lockfile format + +Cachi2 expects the lockfile to be named `generic_lockfile.yaml`. +In order to account for possible future breaking changes, the lockfile will contain a `metadata` section with a `version` +field that will indicate the version of the lockfile format. It will also contain a list of artifacts (files) to download, +each of the artifacts to have a URL, list of checksums, and optionally target location specified. + +```yaml +metadata: + # uses X.Y semantic versioning + version: "1.0" +artifacts: + - download_url: https://huggingface.co/instructlab/granite-7b-lab/resolve/main/model-00001-of-00003.safetensors?download=true + target: granite-model-1.safetensors + checksums: + sha256: d16bf783cb6670f7f692ad7d6885ab957c63cfc1b9649bc4a3ba1cfbdfd5230c +``` + +#### Lockfile properties + +Below is an explanation of individual properties of the lockfile. + +##### download_url (required) + +Specified as a string containing the download url of the artifact. + +##### checksums (required) + +Specified as a dictionary of checksum algorithms and their values. At least one cachi2-verifiable checksum must be provided +to ensure at least some degree of confidence in the identity of the artifact. + +#### target (optional) + +This key is provided mainly for the users convenience, so the files end up in expected locations. It is optional and if +not specified, it will be derived from the download_url. Target here means a specific subdirectory inside cachi2's output +directory for the generic fetcher (`{cachi2-output-dir}/deps/generic`). Cachi2 will verify that the target locations, +including those derived from download urls do not overlap. + +### SBOM components + +Artifacts fetched with the generic fetcher will all be recorded in the SBOM cachi2 produces. Given the inability to derive +any extra information about these files beyond a download location and a filename, these files will always be recorded +as SBOM components with purl of type generic. + +Additionally, the SBOM component will contain [externalReferences] of type `distribution` to indicate the url used to download +the file to allow for easier handling for tools that might process the SBOM. + +Here's an example SBOM generated for above file. + +```json +{ + "bomFormat": "CycloneDX", + "components": [ + { + "name": "granite-model-1.safetensors", + "purl": "pkg:generic/granite-model-1.safetensors?checksums=sha256:d16bf783cb6670f7f692ad7d6885ab957c63cfc1b9649bc4a3ba1cfbdfd5230c&download_url=https://huggingface.co/instructlab/granite-7b-lab/resolve/main/model-00001-of-00003.safetensors", + "properties": [ + { + "name": "cachi2:found_by", + "value": "cachi2" + } + ], + "type": "file", + "externalReferences": [ + { + "url": "https://huggingface.co/instructlab/granite-7b-lab/resolve/main/model-00001-of-00003.safetensors", + "type": "distribution" + } + ] + } + ], + "metadata": { + "tools": [ + { + "vendor": "red hat", + "name": "cachi2" + } + ] + }, + "specVersion": "1.4", + "version": 1 +} +``` + +## Consequences + +As mentioned before, this package manager enables users to fetch arbitrary files with cachi2 and have them accounted for +in the SBOM. Possible downside could be maintaining the lockfile format, as it is specific to cachi2 (which should be +partially mitigated by versioning it). + +[externalReferences]: https://cyclonedx.org/docs/1.6/json/#components_items_externalReferences