generic fetcher: Official support with ADR #728

kosciCZ · 2024-11-06T11:39:04Z

This change includes the generic fetcher ADR which is a necessary step for the feature to be officially supported. Split from #718

Maintainers will complete the following section

Commit messages are descriptive enough
Code coverage from testing does not decrease and new code is covered
Docs updated (if applicable)
Docs links in the code are still valid (if docs were updated)

Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:

approve GitHub Actions workflows by clicking a button
approve the Red Hat Trusted App Pipeline container build by commenting /ok-to-test
(as is the standard for Pipelines as Code)

docs/adr/0001-add-generic-fetcher.md

eskultety

@kosciCZ minor suggestions I'd say but content wise not much really, I see the important stuff I would expect from an ADR mentioned there.

a-ovchinnikov

A few trivial modifications to ADR to reflect the most recent changes, also some wording nitpicks, LGTM otherwise.

docs/adr/0001-add-generic-fetcher.md

cachi2/core/package_managers/generic/models.py

docs/usage.md

brunoapimentel · 2024-11-12T17:46:57Z

cachi2/core/package_managers/generic/main.py

@@ -32,24 +32,29 @@ def fetch_generic_source(request: Request) -> RequestOutput:
    components = []
    for package in request.generic_packages:
        path = request.source_dir.join_within_root(package.path)
-        components.extend(_resolve_generic_lockfile(path, request.output_dir))
+        lockfile = package.lockfile or path.join_within_root(DEFAULT_LOCKFILE_NAME).path


I'm not sure if requiring users to provide absolute paths for the lockfile via the CLI is the best approach here. It might be tricky in the context of a CI system where the user does not know exactly what the path to the lockfile will be. We could potentially allow for relative paths if we resolved them within request.source_dir.

Did we verify if this approach would solve any current use case? I'm getting the impression that fetching the lockfile from a remote location could be needed in some cases.

Another concern is security implications of allowing users to specify a filepath that resolves outside of the request's root dir, as it could be used to exploit any larger CI system that Cachi2 is part of. For the current use cases, Cachi2 always verifies that all the files it is reading are within the request's root dir, and that's the reason we have the RootedPath class.

This is very important in context of using package managers built-in commands that are outside of our control. I reckon that in this case, we'd be safer since we have control over what we're doing with the file. Still, I think some extra consideration is needed since this sets a new precedent for the project.

Ideally, from a security perspective, we'd allow only relative paths based on request.source_dir. But on the other hand, we're trying to solve the use case of upstream projects that would not commit an artifacts.lock.yaml file to it, so this would not be sufficient.

As I suggested before, we could postpone this to another moment in case we're not certain of the decision now. I don't think we should block this ADR because of this.

FYI We as reviewers are contradicting each other in our suggestions: #728 (comment) which is what @kosciCZ followed so we need to get things straight rather quickly otherwise it's an awfully frustrating experience for the contributor.

Another concern is security implications of allowing users to specify a filepath that resolves outside of the request's root dir, as it could be used to exploit any larger CI system that Cachi2 is part of. For the current use cases, Cachi2 always verifies that all the files it is reading are within the request's root dir, and that's the reason we have the RootedPath class.

We do and that makes total sense for most scenarios, manifest files IMO not being one of them, but in general we should keep honoring this I agree.

This is very important in context of using package managers built-in commands that are outside of our control. I reckon that in this case, we'd be safer since we have control over what we're doing with the file. Still, I think some extra consideration is needed since this sets a new precedent for the project.

The manifest file is a declarative yaml file, so in order for this to be exploitable the absolute path would have to be a symlink to a valid YAML file containing some hidden parser instructions to exploit the YAML parser itself which would then need to gain full control over our own process; other than that I don't see how we are safer with a potentially malicious checked-in manifest file, IOW this manifest file only contains data pointing to online resources which we are not able to evaluate on our own anyway, so how do you see this manifest file exploiting the containing CI systems cachi2 is part of? I think any pipeline subsystem/service should be hardened on its own for this very reason that any component can essentially be exploited and so nothing can be implicitly trusted and attack vectors can only be mitigated, not avoided.

Ideally, from a security perspective, we'd allow only relative paths based on request.source_dir. But on the other hand, we're trying to solve the use case of upstream projects that would not commit an artifacts.lock.yaml file to it, so this would not be sufficient.

Relative paths solve nothing in general because the same file a user would provide via an absolute path they'd now purposfully copy over to the repository to pass our check, so from security perspective we didn't achieve anything, the file is either checked-in to the repo in which case we're putting our confidence into the community collective responsibility for the project or it isn't checked into the repository in which case we have absolutely no guarantee on anything regardless of where the file came from and since users can customize their pipeline automation (e.g. tekton) you can bet they'll use whatever power they have to workaround our trivial path checks (also demonstrated by an example posted by @kosciCZ here: https://github.com/red-hat-data-services/trustyai-explainability/pull/44/files)

As I suggested before, we could postpone this to another moment in case we're not certain of the decision now. I don't think we should block this ADR because of this.

I disagree , I outlined why in: #728 (comment) . If we make this a follow up work, then:

it'll likely never happen and

we'd be forcing our user base to check the manifest file to their repos and once they do so, they won't be willing to go back easily which is bad UX IMO solely because we're forcing people to commit random manifest files to their repos which don't really relate to their project in any way it's just that someone told the project to be built in a particular manner which requires them to use a non-standard configuration file which has no widely-accepted ecosystem around it - perfectly fine for downstream solutions (do whatever you want), not okay with upstream projects IMO

The potential exploit would be this: {"type": "generic", "lockfile": "/root/.docker/config.json"} - an attempt to make cachi2 raise a validation error which would print the content of a file that the CI task wasn't intended to access

The point is more that until now, cachi2 promised not to read/write files outside the --source, --output and --config-file locations. This change breaks that promise, but if the error messages are safe it's not a huge deal

Do we have a promise like that anywhere? Also, that promise isn't broken with a default CLI, cachi2 would only ever look beyond if instructed with additional JSON input arguments on the CLI, so I don't think we're breaking any promises.

TLDR: the error message could be a tiny bit better (avoid exposing the keys), but no further concerns from my side

I had discussion with Adam, so I see there 2 POW:

INPUT is just considered to be regular CLI params, just in json form, then wrapper around CLI (user, OSBS) must ensure, that provide paths are right (any paths from nested files must not get out of CWD)

INPUT is just inline config file and wrapper cannot do much about it, then cachito should ensure that it's not reading outside of CWD, as wrapper may not have the power to ensure paths are safe.

Given the complexity of JSON file, I'm probably more eager to agree with 2

*CWD or dir explicitly set by separate option

idealistically my view is option 2, but practically making an exception for the generic fetcher is fine IMO

let's go with 1) ; I think users may want to provide this file from a separate task, not via source

brunoapimentel

This LGTM, the only thing I wanted is more feedback on this concern here.

My last comment is that I think we should use this PR to move generic out of dev-package-managers, effectively making it stable.

brunoapimentel · 2024-11-12T17:58:55Z

/ok-to-test

eskultety · 2024-11-13T08:46:13Z

This LGTM, the only thing I wanted is more feedback on this concern here.

My last comment is that I think we should use this PR to move generic out of dev-package-managers, effectively making it stable.

If you want to lift the experimental status, then I believe the ability to consume manifest files for the generic fetcher via a secondary channel needs to addressed, but most importantly all maintainers need to be in agreement with the approach.

kosciCZ · 2024-11-13T09:11:09Z

If you want to lift the experimental status, then I believe the ability to consume manifest files for the generic fetcher via a secondary channel needs to addressed, but most importantly all maintainers need to be in agreement with the approach.

@eskultety So do I understand correctly that despite everything else being resolved and ready, this PR is now blocked for two weeks since Bruno is on vacation, despite the feature of being able to specify an absolute lockfile path being approved by two out of three reviewers?

eskultety · 2024-11-14T13:34:04Z

If you want to lift the experimental status, then I believe the ability to consume manifest files for the generic fetcher via a secondary channel needs to addressed, but most importantly all maintainers need to be in agreement with the approach.

@eskultety So do I understand correctly that despite everything else being resolved and ready, this PR is now blocked for two weeks since Bruno is on vacation, despite the feature of being able to specify an absolute lockfile path being approved by two out of three reviewers?

First of all, I don't think this personal plan/situation was publicly announced anywhere and so I don't consider it appropriate doing it on his behalf this way in a public space, so please respect others' privacy and refrain from doing so in the future.

Now, to your question, I will summarize what I said in the relevant threads regarding this concern:

allowing external YAML config files is only as safe as the YAML parser itself we rely on
we already allow our own cachi2 config to be passed from an arbitrary location with an absolute path and it is a YAML, so if a manifest could be used to exploit the YAML parser so could be via our own cachi2 config
we don't check whether files are truly checked into a repo and so if we don't allow arbitrary paths for the artifact manifest people will deliberately circumvent that by injecting the file from a different location just before running cachi2
forcing the manifest to be read from the repo and repo only is IMO only a marginal improvement from security POV (that said, additional hardening should be employed wherever cachi2 is to be executed with elevated privileges), but a bigger convenience penalty for the user base
makes IMO no sense to force people checking a random YAML manifest file to their repos to satisfy a particular 3rd party build toolchain

With all that, no, I am not convinced this work needs to be blocked for any longer with the approvals in place.

eskultety

While I cannot rule out the theoretical possibility of exploiting our input model validators in order to disclose some unintended data of varying severity, I believe the actual problem (pre-existing) is bound to the logging/error reporting system which may need to be further hardened which is something outside of scope of this PR.

kosciCZ · 2024-11-14T14:16:17Z

I appreciate the approval. I will add one more commit that lifts the experimental status.

Target location for the individual fetched artifacts specified in the lockfile is now called `filename`, to better signify its purpose. Signed-off-by: Jan Koscielniak <[email protected]>

This change removes the option to specify multiple checksums for a given artifact in the lockfile as a map and replaces it with a single checksum as a string of "alg:hash" format Signed-off-by: Jan Koscielniak <[email protected]>

For better UX, the lockfile for the generic fetcher is now called `artifacts.lock.yaml` Signed-off-by: Jan Koscielniak <[email protected]>

Users are now allowed to supply an absolute path of the generic lockfile. This should help in cases where it does not make sense to commit the file to the source repository. Signed-off-by: Jan Koscielniak <[email protected]>

Previously, the lockfile validation passed, even if user provided extra information in the lockfile. The lockfile format must now match the schema exactly. Signed-off-by: Jan Koscielniak <[email protected]>

Adds example usage of the `filename` option in the lockfile. Signed-off-by: Jan Koscielniak <[email protected]>

This change includes the generic fetcher ADR which is a necessary step for the feature to be officially supported. Signed-off-by: Jan Koscielniak <[email protected]>

Generic fetcher is now an officially supported part of cachi2. `--dev-package-managers` flag is no longer needed. Signed-off-by: Jan Koscielniak <[email protected]>

brunoapimentel reviewed Nov 7, 2024

View reviewed changes

docs/adr/0001-add-generic-fetcher.md Outdated Show resolved Hide resolved

docs/adr/0001-add-generic-fetcher.md Outdated Show resolved Hide resolved

brunoapimentel reviewed Nov 7, 2024

View reviewed changes

docs/adr/0001-add-generic-fetcher.md Outdated Show resolved Hide resolved

a-ovchinnikov reviewed Nov 7, 2024

View reviewed changes

docs/adr/0001-add-generic-fetcher.md Outdated Show resolved Hide resolved

docs/adr/0001-add-generic-fetcher.md Outdated Show resolved Hide resolved

docs/adr/0001-add-generic-fetcher.md Outdated Show resolved Hide resolved

docs/adr/0001-add-generic-fetcher.md Outdated Show resolved Hide resolved

kosciCZ force-pushed the generic-fetcher-adr branch from 9ca574d to ef5c2d9 Compare November 8, 2024 12:46

eskultety reviewed Nov 8, 2024

View reviewed changes

docs/adr/0001-add-generic-fetcher.md Outdated Show resolved Hide resolved

eskultety reviewed Nov 8, 2024

View reviewed changes

docs/adr/0001-add-generic-fetcher.md Outdated Show resolved Hide resolved

eskultety reviewed Nov 8, 2024

View reviewed changes

docs/adr/0001-add-generic-fetcher.md Outdated Show resolved Hide resolved

eskultety reviewed Nov 8, 2024

View reviewed changes

kosciCZ force-pushed the generic-fetcher-adr branch 3 times, most recently from e23ff34 to edfeca5 Compare November 12, 2024 12:41

kosciCZ requested review from eskultety, brunoapimentel and a-ovchinnikov November 12, 2024 13:19

kosciCZ mentioned this pull request Nov 12, 2024

Design for generic artifact fetching #652

Closed

4 tasks

a-ovchinnikov reviewed Nov 12, 2024

View reviewed changes

brunoapimentel reviewed Nov 12, 2024

View reviewed changes

kosciCZ force-pushed the generic-fetcher-adr branch from edfeca5 to 1ad7a13 Compare November 13, 2024 11:10

kosciCZ requested a review from a-ovchinnikov November 13, 2024 13:28

kosciCZ force-pushed the generic-fetcher-adr branch from 1ad7a13 to 7e459d3 Compare November 13, 2024 13:38

a-ovchinnikov approved these changes Nov 13, 2024

View reviewed changes

eskultety approved these changes Nov 14, 2024

View reviewed changes

kosciCZ added 2 commits November 14, 2024 15:33

generic fetcher: Rename target in lockfile

38c1f29

Target location for the individual fetched artifacts specified in the lockfile is now called `filename`, to better signify its purpose. Signed-off-by: Jan Koscielniak <[email protected]>

generic fetcher: Change checksum format in the lockfile

98fb494

This change removes the option to specify multiple checksums for a given artifact in the lockfile as a map and replaces it with a single checksum as a string of "alg:hash" format Signed-off-by: Jan Koscielniak <[email protected]>

kosciCZ added 5 commits November 14, 2024 15:33

generic fetcher: Rename default lockfile filename

5e686e2

For better UX, the lockfile for the generic fetcher is now called `artifacts.lock.yaml` Signed-off-by: Jan Koscielniak <[email protected]>

generic fetcher: Allow users to specify lockfile location

bf56dff

Users are now allowed to supply an absolute path of the generic lockfile. This should help in cases where it does not make sense to commit the file to the source repository. Signed-off-by: Jan Koscielniak <[email protected]>

generic fetcher: Enforce lockfile format more strictly

b83083a

Previously, the lockfile validation passed, even if user provided extra information in the lockfile. The lockfile format must now match the schema exactly. Signed-off-by: Jan Koscielniak <[email protected]>

generic fetcher: Improve usage docs

50ce5ec

Adds example usage of the `filename` option in the lockfile. Signed-off-by: Jan Koscielniak <[email protected]>

generic fetcher: Official support with ADR

f37b120

This change includes the generic fetcher ADR which is a necessary step for the feature to be officially supported. Signed-off-by: Jan Koscielniak <[email protected]>

kosciCZ force-pushed the generic-fetcher-adr branch from 7e459d3 to f3ae6cb Compare November 14, 2024 14:33

generic fetcher: Lift experimental status

5ebe736

Generic fetcher is now an officially supported part of cachi2. `--dev-package-managers` flag is no longer needed. Signed-off-by: Jan Koscielniak <[email protected]>

kosciCZ force-pushed the generic-fetcher-adr branch from f3ae6cb to 5ebe736 Compare November 14, 2024 15:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generic fetcher: Official support with ADR #728

generic fetcher: Official support with ADR #728

kosciCZ commented Nov 6, 2024

eskultety left a comment

a-ovchinnikov left a comment

brunoapimentel Nov 12, 2024

brunoapimentel Nov 12, 2024

brunoapimentel Nov 12, 2024

eskultety Nov 13, 2024

chmeliik Nov 14, 2024

eskultety Nov 14, 2024

chmeliik Nov 14, 2024

MartinBasti Nov 14, 2024

chmeliik Nov 14, 2024

MartinBasti Nov 14, 2024

brunoapimentel left a comment

brunoapimentel commented Nov 12, 2024

eskultety commented Nov 13, 2024

kosciCZ commented Nov 13, 2024

eskultety commented Nov 14, 2024

eskultety left a comment

kosciCZ commented Nov 14, 2024

generic fetcher: Official support with ADR #728

Are you sure you want to change the base?

generic fetcher: Official support with ADR #728

Conversation

kosciCZ commented Nov 6, 2024

Maintainers will complete the following section

eskultety left a comment

Choose a reason for hiding this comment

a-ovchinnikov left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

brunoapimentel left a comment

Choose a reason for hiding this comment

brunoapimentel commented Nov 12, 2024

eskultety commented Nov 13, 2024

kosciCZ commented Nov 13, 2024

eskultety commented Nov 14, 2024

eskultety left a comment

Choose a reason for hiding this comment

kosciCZ commented Nov 14, 2024