From 65628a07bd3486eb276e3141d2097f8754a5850e Mon Sep 17 00:00:00 2001 From: Jan Koscielniak Date: Tue, 17 Sep 2024 10:27:18 +0200 Subject: [PATCH] fixup! Address code review Changelog: - lockfile has two ways of specifying the url - add a note about ExternalReferences in the SBOM - explicitely reference EC rule used to validate download url - formatting --- docs/design/generic.md | 68 +++++++++++++++++++++++++++++------------- 1 file changed, 48 insertions(+), 20 deletions(-) diff --git a/docs/design/generic.md b/docs/design/generic.md index b1c14690a..6b4024117 100644 --- a/docs/design/generic.md +++ b/docs/design/generic.md @@ -9,50 +9,67 @@ requirements of a hermetic build. ## Context -For context, generic artifact fetching is a use-case of its own (e.g. [OVAL feeds](https://github.com/CISecurity/OVALRepo), -AI models), it is also necessary precursor for implementing support for fetching maven artifacts, which won't be covered -in this design, but in a followup document. +For context, generic artifact fetching is a use-case of its own (e.g. [OVAL feeds][oval-feeds], AI models), it is also +necessary precursor for implementing support for fetching maven artifacts, which won't be covered in this design, but in +a followup document. ## Design In this section, I will try to cover individual parts of the design. -### Source repository +### Source repository and cachi2 lockfile This section will describe the structure of the source repository, that will serve as an input to cachi2. The idea is to define a cachi2 lockfile that will specify individual artifacts to fetch along with necessary metadata - e.g. checksums. -The format chosen for this lockfile is yaml, and will include [purl](https://github.com/package-url/purl-spec) for each +The format chosen for this lockfile is yaml, and will include [purl] for each of the fetched artifacts. This decision was made mainly because it allows for followup implementation of maven support, -with accurate SBOM information. Here's an example of such a lockfile. +with accurate SBOM information. Alternatively, for better user experience, the download url and checksums can be specified +separately, always resulting in a `pkg:generic` purl. Here's an example of such a lockfile with both options. ```yaml artifacts: - purl: pkg:generic/granite-model?download_url=https://huggingface.co/instructlab/granite-7b-lab/resolve/main/model-00001-of-00003.safetensors?download=true - target: granite-model.safetensors + target: granite-model-1.safetensors checksums: sha256: 07123e1f482356c415f684407a3b8723e10b2cbbc0b8fcd6282c49d37c9c1abc + - download_url: https://huggingface.co/instructlab/granite-7b-lab/resolve/main/model-00002-of-00003.safetensors?download=true + target: granite-model-2.safetensors + checksums: + sha256: 90bffe1884b84d5e255f12ff0ecbd70f2edfc877b68d612dc6fb50638b3ac17c ``` -#### Lockfile format and validation +#### Specify by purl + +This option is provided mostly as a necessary step for the maven support, but can be used for generic artifacts as well. +At this time, the only purl is `pkg:generic` and will result in a `pkg:generic` SBOM component. ##### purl (required) At this point, the only purl type allowed would be `pkg:generic`. This is because cachi2 has no good way of verifying additional properties of the fetched artifact that could be included in the resulting SBOM. This should create a strong incentive to use this feature in the only truly necessary cases, because it will generate low-quality SBOM components, -as compared to using other package managers provided by cachi2. Additionally, the only allowed qualifier should be `download_url`. +as compared to using other package managers provided by cachi2. Additionally, the only allowed qualifiers will be +`download_url` and `checksums`. -#### target (optional) +#### Specify by download_url and checksums + +This option is provided for better user experience by specifying the url and checksums separately. This option will always +result in a `pkg:generic` purl. + +##### download_url (required) -This is mainly for the users convenience, so the files end up in expected locations. Target here means a specific subdirectory -inside cachi2's output directory. Special care needs to be taken to ensure there is not a conflict with other downloaded files. -If not specified, filename of the downloaded file will be used. +Specified as a string containing the download url of the artifact. -##### checksums (optional) +##### checksums (required) -I've chosen tho separate checksums from the purl, mostly for better readability of the lockfile, but this can be up for -discussion. If no checksum is provided, cachi2 should still download the artifact, but report this fact in the output -SBOM component. +Specified as a dictionary of checksum algorithms and their values. At least one cachi2-verifiable checksum must be provided. + +#### target (optional) + +This key is common for both options and providedmainly for the users convenience, so the files end up in expected locations. +Target here means a specific subdirectory inside cachi2's output directory (likely `cachi2-output/deps/generic`). +Special care needs to be taken to ensure there is not a conflict with other downloaded files. If not specified, filename +of the downloaded file will be used. ### SBOM @@ -62,21 +79,25 @@ very little space for the user to provide inaccurate information. Cachi2 should checksums and report the purl as-is, as it contains no extra information. The section below outlines how that information will be verified at later time. +Additionally, the SBOM component for an artifact fetched this way should contain the [ExternalReferences][external-references] +key with `type` set to `distribution` and `url` set to the download url gathered from the purl. + ### Validation of user input As stated above, cachi2 will perform little to no verification of identity of the downloaded artifacts besides verifying checksums. However, it will provide enough information in the SBOM so tooling that comes after cachi2 can enforce policies. -An example of this would be the [Enterprise Contract](https://enterprisecontract.dev/) (EC) project, that enforces policies +An example of this would be the [Enterprise Contract][ec] (EC) project, that enforces policies based on the provided SBOM. In the context of this feature, EC would be supplied with the following information by cachi2 in the SBOM: - checksums were provided and verified - list of checksum algorithms used -- download urls (as part of the purl) +- download urls (in the `ExternalReferences` key) Enterprise contract policy would then be able to restrict accepting content without checksums, enforce certain algorithms -for checksum verification or only allow certain patterns in the download url. +for checksum verification or only allow certain patterns in the download url (utilizing existing [allow][ec-allow]/ +[deny][ec-deny] rule). ### Integration testing @@ -93,3 +114,10 @@ Here's a preliminary work breakdown: - add integration tests covering the new package manager - generate PURLs for all downloaded artifacts - add documentation + +[ec]: https://enterprisecontract.dev/ +[ec-allow]: https://enterprisecontract.dev/docs/ec-policies/release_policy.html#sbom_cyclonedx__allowed_package_external_references +[ec-deny]: https://enterprisecontract.dev/docs/ec-policies/release_policy.html#sbom_cyclonedx__disallowed_package_external_references +[external-references]: https://cyclonedx.org/docs/1.4/json/#externalReferences +[oval-feeds]: https://github.com/CISecurity/OVALRepo +[purl]: https://github.com/package-url/purl-spec