-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve Trusty integration #3277
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one minor comment, but looks good.
// Classify all dependencies, tracking all that are malicious or scored low | ||
for _, dep := range prDependencies.Deps { | ||
if err := classifyDependency(ctx, &logger, e.client, ruleConfig, prSummaryHandler, dep); err != nil { | ||
return fmt.Errorf("classifying dependency: %w", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not blocking, but do you think we should error out the whole evaluation here? I wonder if we could classify a dependency as something like unknown instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this logic is responsible for blocking PRs introducing malware into the codebase, I would rather build a more resilient client to make the requests more robust and indeed fail if we can't get the trusty score instead of just letting it through as an unknown. WDYT?
The only two cases where this might fail are when there is an error talking to trusty or due to a misconfiguration in the profile.
There is more payload from trusty we could leverage here. Provenance: sigstore or historical provenance. It might make sense to surface a threshold here for folks to set within the policy. This could be an float (I think?) between 1-10, or a simply bool cc: @therealnb , @yrobla We also have deprecated or achieved available as fields to report in the same way as malicious |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ No Invisible Unicode Characters Detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ No Invisible Unicode Characters Detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ No Mixed Scripts Detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ No Invisible Unicode Characters Detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ No Mixed Scripts Detected.
I really like the new format, but it seems like a high-scoring package is getting flagged now as well, see e.g. jakubtestorg/bad-python#193 |
// summary score is used. | ||
// If `evaluate_score` is set to something else (e.g. `provenance`) | ||
// then that score is used, which comes from the details field. | ||
EvaluateScore string `json:"evaluate_score" mapstructure:"evaluate_score"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this is probably why the "good" packages are now flagged as low scoring? Does it mean that everyone who deployed this profile with the old ruletype (before https://github.com/stacklok/minder-rules-and-profiles/pull/111) would get all their deps flagged?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mmh interesting. Let me check why that one is getting flagged. The removal of EvaluateScore
should not affect this one in particular as it has a high score and also a high provenance component.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could not reproduce it (see here) it is weird because the profile is the same and the previous EvaluateScore
is not used anymore. I'll keep looking
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ No Invisible Unicode Characters Detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ No Mixed Scripts Detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ No Invisible Unicode Characters Detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ No Mixed Scripts Detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ No Invisible Unicode Characters Detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ No Mixed Scripts Detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ No Invisible Unicode Characters Detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ No Mixed Scripts Detected.
Let me try to reproduce again so I don't send you down the wrong path |
I think there are two issues: The evaluator has hardcoded default configuration which should include also the two new attributes and additionally it seems like the condition when evaluating the scores are reversed - shouldn't we check that the score of the package is higher than the config? Check out the diff below:
|
This PR surfaces the trusty malicious data to the comment added by minder when the trusty evaluator inspects a PR. Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
This commit adds a few unit tests to some of the new utility functions handling the trusty evaluator. Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
This commit drops the configuration getter from the trusty ecosystem config. As we now have access to the individual components we can write rules on each of them independent of each other. Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
This commit implements the new trusty template which exposes all score components. Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
This commit adds a simple test to ensure the comment template parses correctly. Signed-off-by: Adolfo García Veytia (Puerco) <[email protected]>
Signed-off-by: Adolfo García Veytia (puerco) <[email protected]>
Signed-off-by: Adolfo García Veytia (puerco) <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ No Invisible Unicode Characters Detected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
✅ No Mixed Scripts Detected.
Ah good catch jakub, I've flippped the logic on the constants. Although those were not being used for the output anymore they were still being weighted to classify. I've pushed a change with the flipped comparisons. |
Should we have a test for this? It feels like the sort of bug that we could easily accidentally introduce again. |
We do have a smoke test, but alas, no way of running it locally yet. You are right that we should do better on the unit testing front - not having proper tests is mostly my fault, we do have reasonable tests for the OSV evaluator and the current Trusty evaluator code was meant to be a "quick hack" until we can generalize the PR dependency evaluators into common code and built both OSV and trusty evaluators atop them. We just haven't prioritized that work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for being patient with this long review!
I think we can merge this now. Let's start adding tests in further PRs |
|
Summary
This PR surfaces the Trusty maliciousness data on the comment added by minder when analyzing dependencies. It also adds the required piping to add the data to the rest of the required low scored dependencies.
I've broken the Trusty evaluator to more scoped utility functions to make them more testable and added a few initial unit tests. It needs a little bit more mocking to write an integration test and the rest of the unit tests
I've simplified the comment template, we now have a single template instead of 3 and it is now pure markdown which is smaller. Here' is a screenshot of the output with malicious dependencies:
Demo PRs showing
Change Type
Mark the type of change your PR introduces:
Testing
Added initial tests (up to 22% from 0 :) )
Review Checklist: