Fix Rule Evaluation Logic for Handling Multiple Rules of the Same Type - Part 1 (No Data Backfilling) #2161

Vyom-Yadav · 2024-01-21T19:49:40Z

Major Changes and Implications:

Introducing a new field, name, which serves to uniquely identify rules. This field will be optional in certain scenarios.
Current rule evaluations will remain unchanged and will not include the rule_name. Instead, evaluations for existing rules will continue using the previous process exclusively.
New profiles will undergo evaluation and insertion using the updated model.
Documentation will outline the conditions under which the name field is optional and when it is required.

eleftherias

Thank for the PR @Vyom-Yadav! This is looking great!

I have a small comment inline and a general question about the reconciliation.

When trying this out locally, I had an older profile entry in my DB with 2 dependabot_configured rules.
After applying the migration and updating the server, I see that the JSON contextual_rules in entity_profiles is updated with the generated names for both dependabot rules, which is great.
I also see that I have 1 rule with the type dependabot_configured in rule_evaluations with the correctly generated rule name.
However, the 2nd dependabot_configured rule is not in my rule_evaluations.

Is it by design that we only keep the one previously saved rule? Or should we create the second rule, even though there was no entry for it in rule_evaluations previously?

proto/minder/v1/minder.proto

Vyom-Yadav · 2024-01-22T17:09:30Z

However, the 2nd dependabot_configured rule is not in my rule_evaluations.

rule_evaluations are created and upserted by the engine. The problem is that rules for entities are evaluated again only when we receive certain events, like I mentioned in the desc:

The rules are reconciled by the subsequent InternalProfileInitEventTopic, and ExecuteEntityEventTopic events. Profiles/Repositories not receiving these types of events won't be reconciled to update them to the latest state.

Is it by design that we only keep the one previously saved rule? Or should we create the second rule, even though there was no entry for it in rule_evaluations previously?

It is by our event-driven architecture. But I have mixed feelings about it. If we reconcile continuously and not based on events (internal, webhooks, etc.), we will consume a lot of resources as the reconciliation would continuously evaluate rules for entities. On the other hand, if we don't, then situations like these arise when the DB is not in the correct state after an update.

The bottom line is with the current architecture and the way the engine processes the entities, after updating the db, rule evaluations won't be triggered again until we receive events of certain types. If we are using minder-server cli to update the DB, we can trigger the reevaluation for all entities in the DB. But if one is updating using native SQL, then that won't be possible. wdyt and what do you suggest?

evankanderson · 2024-01-22T18:14:01Z

However, the 2nd dependabot_configured rule is not in my rule_evaluations.

rule_evaluations are created and upserted by the engine. The problem is that rules for entities are evaluated again only when we receive certain events, like I mentioned in the desc:

The rules are reconciled by the subsequent InternalProfileInitEventTopic, and ExecuteEntityEventTopic events. Profiles/Repositories not receiving these types of events won't be reconciled to update them to the latest state.

This is unfortunate -- it would be nice if we could trigger a re-reconcile of the rules when we do this "update". (Ideally, we would either clear out the old evaluations that don't have a name and write new ones that did have a name.)

Is it by design that we only keep the one previously saved rule? Or should we create the second rule, even though there was no entry for it in rule_evaluations previously?

It is by our event-driven architecture. But I have mixed feelings about it. If we reconcile continuously and not based on events (internal, webhooks, etc.), we will consume a lot of resources as the reconciliation would continuously evaluate rules for entities. On the other hand, if we don't, then situations like these arise when the DB is not in the correct state after an update.

Edge-triggering is an optimization, but we should be aiming for a level-based (continuous reconciliation) system. So your unease is correct.

In Kubernetes, efficiency + level-based behavior is handled by having two different mechanisms for an event to be reconciled (this is in the controller / client libraries):

Edge-triggered reconciliation when a resource changes (as notified by Watch in the K8s case, or by webhook in the GitHub case)
Background reconciliations which happen on a periodic timer, whether or not there was any activity on the resource.

We're currently missing the second, but we should be thinking about adding it, probably as some sort of background scan of all the existing rules at an infrequent (e.g. flag controlled, default to 24 hour) visitation. We'd probably want some sort of pacing on this background scan as well -- we don't want to go into overdrive at 3AM, blow through GitHub API quota limits, and then do nothing from 3:30AM to 3AM the next day.

The bottom line is with the current architecture and the way the engine processes the entities, after updating the db, rule evaluation won't be triggered again until we receive events of certain types. If we are using minder-server cli to update the DB, we can trigger the reevaluation for all entities in the DB. But if one is updating using native SQL, then that won't be possible. wdyt and what do you suggest?

This may be a bias in my background, but I'm used to fairly "dumb" data stores and smarter clients. What would this look like if we did the promotion piecemeal outside the database:

handlers_profile.go will automatically create rule names on new Profiles based on the rule_type name if they are not already specified (I think you've got this).
handlers_profile.go will also run the same algorithm on output of existing rules to make rule names in a deterministic way. We'd still want to flag collisions, but we might automaticallly name the rules as e.g. <rule_type>, <rule_type>_1, <rule_type>_2, etc.
Update handlers_profile.go to search for rule_name = sqlc.narg(rule_name) OR rule_name IS NULL during the migration.
Update executor.go to fill in the rule.Name field on rules before passing them to TraverseRules. This could optionally store back to the database.

I think the interesting question is how to backfill the existing rule statuses -- two options are to simply leave the compatibility code in forever, and to backfill the database after the migration.

evankanderson · 2024-01-22T18:14:15Z

BTW -- I'm looking to discuss approaches in this review, not dictate them.

evankanderson · 2024-01-22T18:16:25Z

With that said, it also occurs to me that the database approach here may cause a short window of breakage between when the migration script runs and when the Minder server is updated with the new queries, because the existing queries won't be including the rule_name field, which is NOT NULL. If it somehow didn't cause breakage, that would be even more concerning, because it would mean that some profiles could slip in without the rule.name normalization applied.

evankanderson · 2024-01-22T18:17:59Z

Alternatively, since the 000013 migration is in a BEGIN; ... COMMIT transaction block, the other option is that this might block upgrades if there's a conflicting transaction before migration commits.

Vyom-Yadav · 2024-01-22T18:45:52Z

Background reconciliations which happen on a periodic timer, whether or not there was any activity on the resource.

Honestly, background reconciliation was the next issue I was going to pick, as we discussed that in #2109 (comment) too.

handlers_profile.go will automatically create rule names on new Profiles based on the rule_type name if they are not already specified (I think you've got this).

handlers_profile.go will also run the same algorithm on output of existing rules to make rule names in a deterministic way. We'd still want to flag collisions, but we might automaticallly name the rules as e.g. <rule_type>, <rule_type>_1, <rule_type>_2, etc.

Update handlers_profile.go to search for rule_name = sqlc.narg(rule_name) OR rule_name IS NULL during the migration.

Update executor.go to fill in the rule.Name field on rules before passing them to TraverseRules. This could optionally store back to the database.

Interesting, you are suggesting bringing the database to the correct state using Go. I used SQL as I didn't want to add the code of adding name to rows in handlers_profile, as it is a one-time operation while updating and doesn't make a lot of sense to store in a handler.

Ideally, we should have background reconciliation to solve this issue.

With that said, it also occurs to me that the database approach here may cause a short window of breakage between when the migration script runs and when the Minder server is updated with the new queries, because the existing queries won't be including the rule_name field, which is NOT NULL

Can we have some downtime to update the server? I don't see a way of making this update an atomic one. We could:

Update the server first, this would give a few errors (no crashes hopefully) when trying to delete or upsert rule evaluations as rule_name column doesn't exist. This would also prevent users from inserting new profiles with duplicate rules. They would be able to create profiles, but the server would throw errors while evaluating as no rule_name columns exist.
Update the database next.

This approach ensures no bad data (without name) sneaks in while performing updates. wdyt?

Vyom-Yadav · 2024-01-22T18:48:08Z

Update the server first, this would give a few errors (no crashes hopefully) when trying to delete or upsert rule evaluations as rule_name column doesn't exist. This would also prevent users from inserting new profiles with duplicate rules. They would be able to create profiles, but the server would throw errors while evaluating as no rule_name columns exist.

This would require some modifications to the SQL script, particularly not to add name to contextual rules when it already exists.

Vyom-Yadav · 2024-01-22T18:50:56Z

Also, I frequently use bold and all caps, please don't take it as shouting, it is just a way for me to highlight stuff.

evankanderson · 2024-01-23T14:31:18Z

Background reconciliations which happen on a periodic timer, whether or not there was any activity on the resource.

Honestly, background reconciliation was the next issue I was going to pick, as we discussed that in #2109 (comment) too.

Yeah, background reconciliation has been on the "to-do" list for a while, but there are a few other features we've been trading off against (like the ability to allow multiple collaborators on the same project) and it hasn't made it to the top yet. A PR for that would be be awesome.

handlers_profile.go will automatically create rule names on new Profiles based on the rule_type name if they are not already specified (I think you've got this).

handlers_profile.go will also run the same algorithm on output of existing rules to make rule names in a deterministic way. We'd still want to flag collisions, but we might automaticallly name the rules as e.g. <rule_type>, <rule_type>_1, <rule_type>_2, etc.

Update handlers_profile.go to search for rule_name = sqlc.narg(rule_name) OR rule_name IS NULL during the migration.

Update executor.go to fill in the rule.Name field on rules before passing them to TraverseRules. This could optionally store back to the database.

Interesting, you are suggesting bringing the database to the correct state using Go. I used SQL as I didn't want to add the code of adding name to rows in handlers_profile, as it is a one-time operation while updating and doesn't make a lot of sense to store in a handler.

Ideally, we should have background reconciliation to solve this issue.

Yeah, that would be my preference. I'm also coming at this more from an availability /

With that said, it also occurs to me that the database approach here may cause a short window of breakage between when the migration script runs and when the Minder server is updated with the new queries, because the existing queries won't be including the rule_name field, which is NOT NULL

Can we have some downtime to update the server? I don't see a way of making this update an atomic one. We could:

I agree that the update can't be atomic across all database rows and the server.

BTW, behind the scenes, we're running the helm chart through ArgoCD, which means that it will run the following operations:

Run the migration script
If the migration script succeeds, update the server replicas (multiple!)

It would be a bit tricky to actually take the server down, because ArgoCD will notice that something that's supposed to be running isn't, and will try to bring it back up.

One oddity here is that we've got an additional database we're triangulating against in terms of the GitHub API. Since the writes to rule_evaluations happen after we've done any work against GitHub (IIRC), we might end up doing work but not recording it if the database blocks writes (either because the NOT NULL column constraint, or because we're trying to write a non-existent column).

(All this operational stuff isn't normally visible to contributors, but database migrations with mandatory new columns are trickier for operational rollout than other types of PRs.)

Update the server first, this would give a few errors (no crashes hopefully) when trying to delete or upsert rule evaluations as rule_name column doesn't exist. This would also prevent users from inserting new profiles with duplicate rules. They would be able to create profiles, but the server would throw errors while evaluating as no rule_name columns exist.

Update the database next.

This approach ensures no bad data (without name) sneaks in while performing updates. wdyt?

To do this, we'd need to update the server code without the migration (adding the extra columns to the schema somehow without the migration) and add the code to write the values for new rows. We'd then have a second commit (second helm chart release) which did the migration.

BTW, by default Postgres considers NULL values distinct from each other, so you could actually do:

ALTER TABLE rule_evaluations
  ADD COLUMN rule_name TEXT;

CREATE UNIQUE INDEX rule_evaluations_results_idx2
  ON rule_evaluations (profile_id, repository_id, COALESCE(artifact_id, '00000000-0000-0000-0000-000000000000'::UUID),
                       entity, rule_type_id, COALESCE(pull_request_id, '00000000-0000-0000-0000-000000000000'::UUID),
                       rule_name);

-- Drop the existing unique index on rule_evaluations
DROP INDEX IF EXISTS rule_evaluations_results_idx;

in the first migration, and existing code would continue to work. You could then do step 2 in a separate PR to backfill the existing rules.

Operationally, we'd need to note (for ourselves, and anyone else running Minder as a service) that we'd need to push between commit A and commit B, but that is do-able. Commit B could then do the backfill and mark rule_evaluations NOT NULL.

evankanderson · 2024-01-22T18:11:22Z

database/migrations/000013_rule_evaluations_rule_name.up.sql

+CREATE UNIQUE INDEX rule_evaluations_results_idx
+  ON rule_evaluations (profile_id, repository_id, COALESCE(artifact_id, '00000000-0000-0000-0000-000000000000'::UUID),
+                       entity, rule_type_id, COALESCE(pull_request_id, '00000000-0000-0000-0000-000000000000'::UUID),
+                       rule_name);


I wonder if we want to set NULLS NOT DISTINCT on this index.

(We can't do that until after the backfill, though)

proto/minder/v1/minder.proto

evankanderson · 2024-01-24T20:06:11Z

I'm happy to review when you're ready -- does doing this as a two-part change (add column, start filling as phase 1, then use SQL to migrate all existing rules) make sense?

Vyom-Yadav · 2024-01-25T06:10:52Z

I'm happy to review when you're ready -- does doing this as a two-part change (add column, start filling as phase 1, then use SQL to migrate all existing rules) make sense?

Yeah, the initial update would support null and non-null rule-name. All new profiles added would have rule-name populated, and the older ones would be backfilled in the next update. There were some changes required for this in the handler, would push those after testing.

Vyom-Yadav · 2024-01-26T18:41:10Z

@evankanderson @eleftherias Just pinging in case you missed the review request. This and #2206 are ready for review. To give a brief summary:

The update will be done in two parts.
This PR adds support for rule_name. Existing rows would have rule_name as null, and newer profiles created after this update would have populated rule_name. The code can handle both null and non-null entires.
Fix Rule Evaluation Logic for Handling Multiple Rules of the Same Type - Part 2 (Data Backfilling) #2206 Fills the existing data. rule_name is calculated for profiles without one, and the evaluations are updated as well. We use an auxiliary table to support migrate-down of updated data.

Still, the existing data is not reconciled and is totally dependent on events to get updated to the latest state. This problem would be solved with the help of level-based reconciliation in another PR.

evankanderson

This is looking pretty good! The two comments I care about are:

I repeated this a bunch of times, but doing the defaulting before validation will probably simplify your logic, as right now you're trying to take into account what the defaults will be before they're written into the input.
I think we probably need to keep the badly-named rule fields in the proto for at least a month or so to give people a chance to migrate. We can put a big comment on them that the field is going away, but we probably don't want to break compatibility for all our existing users.

The second item is kinda tricky to think about given that we have client & server in the same codebase, so it's easy to make changes and think about them atomically. Sorry about that!

Some color that you can ignore if you want:

I've pointed out a few places where the naming is "java-flavored" (or "C#-flavored"). Go tends to towards more terse names (see the end of package names in Effective Go for an example). As another example, ruleTypeAndNamePair could probably be called ruleKey at no loss of understanding. For folks using more than one programming language, it can be tricky to keep track of the different "accents", so I'm not asking for fixes here. 😁

Another style note for future PRs (don't apply it here) is to separate refactoring changes (which are super-fast to review) from logic changes (which usually require a bit more thought). The rename of dbstat --> dbProfileStatus is an example of this: it adds a bunch of diffs to the UI that don't really affect correctness, but make it harder to see where other code is changing (because there's already a change on the line).

Thank you again for the nice tests in handlers_profile_test.go!

proto/minder/v1/minder.proto

evankanderson · 2024-01-28T09:03:51Z

proto/minder/v1/minder.proto

-    string rule = 5;
+    string rule_type = 6;
+    string rule_name = 7;


Ditto here -- we can just put the same rule_type data into rule, with a // deprecated, will be removed after 2024-03-01 comment, which will give people a month to upgrade their CLI.

Marked that field as deprecated for backward compatibility.

evankanderson · 2024-01-28T09:08:44Z

database/migrations/000013_rule_evaluations_rule_name.up.sql

+-- Recreate the unique index with rule_name
+CREATE UNIQUE INDEX rule_evaluations_results_idx
+  ON rule_evaluations (profile_id, repository_id, COALESCE(artifact_id, '00000000-0000-0000-0000-000000000000'::UUID),
+                       entity, rule_type_id, COALESCE(pull_request_id, '00000000-0000-0000-0000-000000000000'::UUID),
+                       rule_name) NULLS NOT DISTINCT;


Just reasoning this through... it's safe to create this as NULLS NOT DISTINCT because that's the state we have today, so incorporating the column like this should be safe. (And since we're in a transaction, dropping the old index before adding the new is also safe; the table should only be locked briefly at our current data size.)

Correct. The reason we use NULLS NOT DISTINCT is because if we don't, two rule evaluations, which are identical would now be unique and we would have something like (omitted some cols):

entity profile_id rule_type_id rule_name

repo 1 1 null

repo 1 1 null

Now this is good for the meantime, we would also fix evaluations for existing profiles, but when we try to backfill, we can end up with:

entity profile_id rule_type_id rule_name

repo 1 1 abc_rule

repo 1 1 abc_rule

Leading to a violation of our index. That's why we require NULLS NOT DISTINCT .

And since we're in a transaction, dropping the old index before adding the new is also safe; the table should only be locked briefly at our current data size.

Yes, it is safe.

To be double sure, I checked the docs again, and:

From https://www.postgresql.org/docs/15/sql-dropindex.html

A normal DROP INDEX acquires an ACCESS EXCLUSIVE lock on the table, blocking other accesses until the index drop can be completed.

From https://www.postgresql.org/docs/15/explicit-locking.html

Once acquired, a lock is normally held until the end of the transaction.

We don't have any savepoints so lock won't be released until the end of the transaction.

evankanderson · 2024-01-28T09:13:36Z

internal/controlplane/handlers_profile.go

+	// Adds default rule names, if not present
+	populateRuleNames(in)


Thought: do we want to do this before s.getAndValidateRulesFromProfile? I realize it probably doesn't matter at the moment, but it seems like we'd want to default values and then validate the request with the defaults present.

It would make the logic simple but, the errors won't be as clear, consider this profile:

repository: - type: dependabot_configured def: package_ecosystem: gomod - type: dependabot_configured def: package_ecosystem: npm

The user has not specified the name, but as we populate before validation, we would view the profile as:

repository: - type: dependabot_configured name: dependabot_configured def: package_ecosystem: gomod - type: dependabot_configured name: dependabot_configured def: package_ecosystem: npm

And we would return some error like: rule 'dependabot_configured' has the same name 'dependabot_configured' in entity 'repository', which the user might get confused by as they didn't specify any name.

Now, if you say, if ruleName == ruleType consider user supplied empty name. But that won't be true. Users can explicitly define the name to be equal to rule type, something like:

repository: - type: dependabot_configured name: dependabot_configured def: package_ecosystem: gomod - type: dependabot_configured def: package_ecosystem: npm

This may sound very improbable, but folks actually do stuff like this while writing any yaml.

For the original profile, an error message like cannot have rules with empty name and same type in entity 'repository' would be better and clearer to the user. Wdyt?

I'm okay with either error message -- we are changing the schema out from under users, so a message like

rule 'dependabot_configured' conflicts with 'dependabot_configured' in the same profile, assign a unique name to at least one.

would probably not confuse users even if they hadn't been using the name field previously.

Done, I modified the wording of errors to give some sort of solution as well.

evankanderson · 2024-01-28T09:23:54Z

internal/controlplane/handlers_profile.go

+		if errors.Is(err, sql.ErrNoRows) {
+			log.Printf("the rule instantiation for rule already existed.")


Q: If entity_profile_rules only keys off the rule_type_id and the entity_profile_id, won't we end up with multiple rules with different names corresponding to the same entity_profile_rules row?

A: This should be safe, because the only query of entity_profile_rules is ListProfilesInstantiatingRuleType in handlers_ruletype.go, which uses it to ensure that updated / delete rule types don't cause conflicts with profiles which reference them. For this use case, we don't need to know the name of the rule, only the type mapping.

Exactly, entity_profile_rules is only used to query the rule_types an entity in a profile uses. Elsewhere, we unmarshal entity_profiles.contextual_rules before processing rules.

(I'm using these comments to help myself and future folks follow the flow of the review -- sometime in ~June, someone will do a git blame and come back to this PR and try to understand the decisions we made. All of us will have forgotten, but GitHub won't!)

internal/controlplane/handlers_profile_test.go

evankanderson · 2024-01-28T10:28:03Z

internal/controlplane/handlers_profile_test.go

Thanks for writing these tests! They give me a lot more confidence in the new logic!

evankanderson · 2024-01-28T10:29:45Z

internal/controlplane/handlers_profile_test.go

+				{RuleType: "Type1", RuleName: "Name1"}: {Entity: minderv1.Entity_ENTITY_REPOSITORIES, RuleID: generateConsistentUUID(t, "Type1", "Name1")},
+				{RuleType: "Type1", RuleName: "Name2"}: {Entity: minderv1.Entity_ENTITY_BUILD_ENVIRONMENTS, RuleID: generateConsistentUUID(t, "Type1", "Name2")},


Do we want to have a test for names which have been "promoted" from the rule type? e.g.

Suggested change

{RuleType: "Type1", RuleName: "Name1"}: {Entity: minderv1.Entity_ENTITY_REPOSITORIES, RuleID: generateConsistentUUID(t, "Type1", "Name1")},

{RuleType: "Type1", RuleName: "Name2"}: {Entity: minderv1.Entity_ENTITY_BUILD_ENVIRONMENTS, RuleID: generateConsistentUUID(t, "Type1", "Name2")},

{RuleType: "Type1", RuleName: "Type1"}: {Entity: minderv1.Entity_ENTITY_REPOSITORIES, RuleID: generateConsistentUUID(t, "Type1", "Type1")},

{RuleType: "Type1", RuleName: "Type1"}: {Entity: minderv1.Entity_ENTITY_BUILD_ENVIRONMENTS, RuleID: generateConsistentUUID(t, "Type1", "Type1")},

(Since the rules apply to different entities, this shouldn't twig validateRuleNameAndTypeInProfile, though it would be a bit surprising to see in practice)

There is actually a bug in the profile handler that we can't handle rules with the same type in different entities. Me and @JAORMX discussed this in discord https://discord.com/channels/1184987096302239844/1185287949240242258/1199645373791219723 (Issue to be created)

I'm not sure "rules of the same type in different entities" makes sense (because a rule_type has an in_entity and rule type names are unique, IIRC). On the other hand, I'd also believe that we don't validate that correctly at the moment if you told me that.

I'll create an issue for this. We'll look at it separately

I'd rather defer this until we have a good enough testing framework that we can write a test to show what our behavior is, and then we can feel how we feel about what the test says.

evankanderson · 2024-01-28T10:39:06Z

internal/engine/actions/alert/security_advisory/security_advisory.go

+
+- Profile: {{.Profile}}
+- Rule: {{.Rule}}
+- Name: {{.Name}}


Why not keep the existing template, and render this as:

Suggested change

- Name: {{.Name}}

{{if .Name -}}

- Name: {{.Name}}

{{end-}}

(In fact, I'm not sure why we didn't do that on the remediation part, but don't do that as part of this PR. 😁

I'll do this in a cleanup PR.

Sorry, I was unclear here -- I meant "don't add tmplPart3Bottom{No,With}Name, but don't remove the description{NoRem}Tmpl split in this PR.

In these trim markers, the white space must be present: "{{- 3}}" is like "{{3}}" but trims the immediately preceding text, while "{{-3}}" parses as an action containing the number -3.

Whitespace got me. Done, fixed.

This now being a 7-line diff in the file fills my heart with joy, thank you!

evankanderson · 2024-01-28T10:42:54Z

internal/engine/eval_status.go

+	ruleName := sql.NullString{
+		String: rule.Name,
+		Valid:  true,
+	}
+
+	// TODO: Remove this after migration, ruleName would be valid after updating existing evaluations (#1609)
+	if len(rule.Name) == 0 {
+		ruleName.Valid = false
+	}


Not for this PR, but it feels like we could use a util.SqlEmptyIsNull(string) &sql.NullString helper, WDYT?

For this particular one, you can write:

Suggested change

ruleName := sql.NullString{

String: rule.Name,

Valid: true,

}

// TODO: Remove this after migration, ruleName would be valid after updating existing evaluations (#1609)

if len(rule.Name) == 0 {

ruleName.Valid = false

}

ruleName := sql.NullString{

String: rule.Name,

// TODO: Set to true after migration, ruleName would be valid after updating existing evaluations (#1609)

Valid: rule.Name != "",

}

Vyom-Yadav · 2024-01-30T12:10:02Z

@evankanderson These points are to address concerns outside of the review. I answered these questions in the review, too.

I repeated this a bunch of times, but doing the defaulting before validation will probably simplify your logic, as right now you're trying to take into account what the defaults will be before they're written into the input.

See #2161 (comment). Validation before population helps us return fine grained error messages.

I think we probably need to keep the badly-named rule fields in the proto for at least a month or so to give people a chance to migrate. We can put a big comment on them that the field is going away, but we probably don't want to break compatibility for all our existing users.

I marked those fields as deprecated to keep backwards compatibility.

I've pointed out a few places where the naming is "java-flavored" (or "C#-flavored"). Go tends to towards more terse names (see the end of package names in Effective Go for an example). As another example, ruleTypeAndNamePair could probably be called ruleKey at no loss of understanding. For folks using more than one programming language, it can be tricky to keep track of the different "accents", so I'm not asking for fixes here. 😁

Can't say I'm not guilty of that 😁, going forward, I would use go-flavoured names.

Another style note for future PRs (don't apply it here) is to separate refactoring changes (which are super-fast to review) from logic changes (which usually require a bit more thought). The rename of dbstat --> dbProfileStatus is an example of this: it adds a bunch of diffs to the UI that don't really affect correctness, but make it harder to see where other code is changing (because there's already a change on the line).

Sorry for that noise, I will take care of that in the future.

evankanderson

I'm happy to merge this as-is unless you want to address any of the comments I made this (morning for me, evening for you)

evankanderson · 2024-01-30T14:43:11Z

internal/controlplane/handlers_profile.go

+		//nolint:staticcheck // ignore SA1019: Deprecated field supported for backward compatibility
+		ruleType = &sql.NullString{
+			String: in.GetRule(),
+			Valid:  in.GetRule() != "",


Non-blocking, but one of rule_type or rule should be set, right? (Otherwise, this is very neat logic, very nice!)

No, rule or rule_type and rule_name both are optional. They are associated with --rule/--ruleType and --ruleName flags respectively, so if user doesn't specify one, we return the complete eval status for a policy.

evankanderson · 2024-01-30T14:51:54Z

internal/engine/actions/alert/security_advisory/security_advisory.go

+
+- Profile: {{.Profile}}
+- Rule: {{.Rule}}
+- Name: {{.Name}}


Sorry, I was unclear here -- I meant "don't add tmplPart3Bottom{No,With}Name, but don't remove the description{NoRem}Tmpl split in this PR.

evankanderson · 2024-01-31T13:44:14Z

You'll need to merge and re-generate minder.pb.go with make gen, it looks like.

Vyom-Yadav · 2024-01-31T13:45:17Z

You'll need to merge and re-generate minder.pb.go with make gen, it looks like.

Yupp, pushing the changes... (doing some test improvements)

Vyom-Yadav · 2024-01-31T15:38:35Z

@evankanderson Ready for review. Changes from the last review are:

Modified validation logic to allow usage of rule_type as name when it does not conflict with the default name.
Added this point to the documentation.
Added additional tests for the same.
Modified validation error messages to provide more information.
Used conditions in the HTML template.

Signed-off-by: Vyom-Yadav <[email protected]>

…e - Part 1 (No Data Backfilling) * Rule Name is a user-defined field for differentiating same rule types under an entity * Existing rules in DB would have rule_name as null and would be processed without a rule name * For creating/updating profiles, rule name would be mandatory to avoid collisions Signed-off-by: Vyom-Yadav <[email protected]>

evankanderson

I have some tiny comments, but I'd rather get this in than fix them in this PR.

evankanderson · 2024-01-31T17:57:29Z

docs/docs/how-to/create_profile.md

+## Defining Rule Names in Profiles
+
+In Minder profiles, rules are identified by their type and, optionally, a unique name.


It feels like it would be simpler to say "by default, a rule is named for the type it references", rather than talking about the named / unnamed case. Most people are probably familiar with defaulting.

evankanderson · 2024-01-31T17:58:08Z

docs/docs/how-to/create_profile.md

If we say that rule names default to the type name, then they're always present (and the same rules always apply), but sometimes you don't need to do so much typing.

evankanderson · 2024-01-31T17:59:44Z

docs/docs/how-to/create_profile.md

+  - type: secret_scanning
+    name: "secret_scanning_github"
+    def:
+      enabled: true
+  - type: secret_scanning
+    name: "secret_scanning_github_2"
+    def:
+      enabled: false


The example from https://github.com/stacklok/minder-rules-and-profiles/blob/944f5f6cf60362250e82e874e4d7c296653f2ef9/profiles/github/profile.yaml#L38 might be more relevant:

- type: dependabot_configured def: package_ecosystem: gomod schedule_interval: weekly apply_if_file: go.mod - type: dependabot_configured def: package_ecosystem: npm schedule_interval: weekly apply_if_file: package.json

Oops, I did this comment twice. Sorry!

evankanderson · 2024-01-31T18:01:55Z

docs/docs/how-to/create_profile.md

+repository:
+  - type: secret_scanning
+    name: "secret_scanning_github"
+    def:
+      enabled: true
+  - type: secret_scanning
+    name: "secret_scanning_github_2"
+    def:
+      enabled: false
+```
+Here, we have two rules of the same type `secret_scanning` under the `repository` entity. Each rule has a unique name.


The example from https://github.com/stacklok/minder-rules-and-profiles/blob/944f5f6cf60362250e82e874e4d7c296653f2ef9/profiles/github/profile.yaml#L38 might be more relevant:

Suggested change

```yaml

repository:

- type: secret_scanning

name: "secret_scanning_github"

def:

enabled: true

- type: secret_scanning

name: "secret_scanning_github_2"

def:

enabled: false

```

Here, we have two rules of the same type `secret_scanning` under the `repository` entity. Each rule has a unique name.

```yaml

repository:

- type: dependabot_configured

name: go_dependabot

def:

package_ecosystem: gomod

schedule_interval: weekly

apply_if_file: go.mod

- type: dependabot_configured

name: node_dependabot

def:

package_ecosystem: npm

schedule_interval: weekly

apply_if_file: package.json

```

Here, we have two rules of the same type `dependabot_configured` under the `repository` entity. Each rule has a unique name.

evankanderson · 2024-01-31T18:31:21Z

internal/controlplane/handlers_profile_test.go

+				{RuleType: "Type1", RuleName: "Name1"}: {Entity: minderv1.Entity_ENTITY_REPOSITORIES, RuleID: generateConsistentUUID(t, "Type1", "Name1")},
+				{RuleType: "Type1", RuleName: "Name2"}: {Entity: minderv1.Entity_ENTITY_BUILD_ENVIRONMENTS, RuleID: generateConsistentUUID(t, "Type1", "Name2")},


I'd rather defer this until we have a good enough testing framework that we can write a test to show what our behavior is, and then we can feel how we feel about what the test says.

evankanderson · 2024-01-31T18:32:40Z

internal/engine/actions/alert/security_advisory/security_advisory.go

+
+- Profile: {{.Profile}}
+- Rule: {{.Rule}}
+- Name: {{.Name}}


This now being a 7-line diff in the file fills my heart with joy, thank you!

Code has been reworked substantially to address comments.

Vyom-Yadav requested a review from a team as a code owner January 21, 2024 19:49

Vyom-Yadav force-pushed the issue-1609-name-method branch 2 times, most recently from 471dbc8 to fa60455 Compare January 22, 2024 15:17

eleftherias previously requested changes Jan 22, 2024

View reviewed changes

proto/minder/v1/minder.proto Outdated Show resolved Hide resolved

evankanderson reviewed Jan 23, 2024

View reviewed changes

Vyom-Yadav force-pushed the issue-1609-name-method branch from fa60455 to fb78963 Compare January 23, 2024 19:23

Vyom-Yadav changed the title ~~Fix Rule Evaluation Logic for Handling Multiple Rules of the Same Type~~ Fix Rule Evaluation Logic for Handling Multiple Rules of the Same Type - Part 1 (No Migrations) Jan 23, 2024

Vyom-Yadav force-pushed the issue-1609-name-method branch 2 times, most recently from fa84592 to 4dee699 Compare January 23, 2024 19:28

Vyom-Yadav marked this pull request as draft January 23, 2024 19:55

Vyom-Yadav force-pushed the issue-1609-name-method branch from 4dee699 to 4522457 Compare January 26, 2024 07:56

Vyom-Yadav changed the title ~~Fix Rule Evaluation Logic for Handling Multiple Rules of the Same Type - Part 1 (No Migrations)~~ Fix Rule Evaluation Logic for Handling Multiple Rules of the Same Type - Part 1 (No Data Backfilling) Jan 26, 2024

Vyom-Yadav force-pushed the issue-1609-name-method branch from 4522457 to c004ebc Compare January 26, 2024 08:15

Vyom-Yadav requested review from eleftherias and evankanderson January 26, 2024 08:27

Vyom-Yadav marked this pull request as ready for review January 26, 2024 08:27

Vyom-Yadav mentioned this pull request Jan 26, 2024

Fix Rule Evaluation Logic for Handling Multiple Rules of the Same Type - Part 2 (Data Backfilling) #2206

Merged

Vyom-Yadav force-pushed the issue-1609-name-method branch from c004ebc to 9e0fafb Compare January 26, 2024 17:22

Vyom-Yadav force-pushed the issue-1609-name-method branch from 9e0fafb to 5aea99f Compare January 27, 2024 08:19

evankanderson reviewed Jan 28, 2024

View reviewed changes

Vyom-Yadav force-pushed the issue-1609-name-method branch 3 times, most recently from c7e67ce to ce1c445 Compare January 30, 2024 11:54

Vyom-Yadav requested a review from evankanderson January 30, 2024 12:05

evankanderson previously approved these changes Jan 30, 2024

View reviewed changes

Vyom-Yadav dismissed evankanderson’s stale review via dbd39f4 January 31, 2024 06:56

Vyom-Yadav force-pushed the issue-1609-name-method branch from ce1c445 to dbd39f4 Compare January 31, 2024 06:56

Vyom-Yadav force-pushed the issue-1609-name-method branch 6 times, most recently from 827fa9d to 3c89f0e Compare January 31, 2024 15:31

Vyom-Yadav requested a review from evankanderson January 31, 2024 15:35

Vyom-Yadav added 2 commits January 31, 2024 22:47

minor: Remove incorrect package comment

6294216

Signed-off-by: Vyom-Yadav <[email protected]>

Vyom-Yadav force-pushed the issue-1609-name-method branch from 3c89f0e to b476b34 Compare January 31, 2024 17:20

evankanderson approved these changes Jan 31, 2024

View reviewed changes

evankanderson merged commit cb3f328 into mindersec:main Jan 31, 2024
20 checks passed

evankanderson mentioned this pull request Feb 2, 2024

Add names to dependabot rule to make the profile valid mindersec/minder-rules-and-profiles#46

Merged

		// Adds default rule names, if not present
		populateRuleNames(in)

		if errors.Is(err, sql.ErrNoRows) {
		log.Printf("the rule instantiation for rule already existed.")

		{RuleType: "Type1", RuleName: "Name1"}: {Entity: minderv1.Entity_ENTITY_REPOSITORIES, RuleID: generateConsistentUUID(t, "Type1", "Name1")},
		{RuleType: "Type1", RuleName: "Name2"}: {Entity: minderv1.Entity_ENTITY_BUILD_ENVIRONMENTS, RuleID: generateConsistentUUID(t, "Type1", "Name2")},

		## Defining Rule Names in Profiles

		In Minder profiles, rules are identified by their type and, optionally, a unique name.

-```yaml
-repository:
-  - type: secret_scanning
-    name: "secret_scanning_github"
-    def:
-      enabled: true
-  - type: secret_scanning
-    name: "secret_scanning_github_2"
-    def:
-      enabled: false
-```
-Here, we have two rules of the same type `secret_scanning` under the `repository` entity. Each rule has a unique name.
+```yaml
+repository:
+  - type: dependabot_configured
+    name: go_dependabot
+    def:
+      package_ecosystem: gomod
+      schedule_interval: weekly
+      apply_if_file: go.mod
+  - type: dependabot_configured
+    name: node_dependabot
+    def:
+      package_ecosystem: npm
+      schedule_interval: weekly
+      apply_if_file: package.json
+```
+Here, we have two rules of the same type `dependabot_configured` under the `repository` entity. Each rule has a unique name.

Fix Rule Evaluation Logic for Handling Multiple Rules of the Same Type - Part 1 (No Data Backfilling) #2161

Fix Rule Evaluation Logic for Handling Multiple Rules of the Same Type - Part 1 (No Data Backfilling) #2161

Conversation

Vyom-Yadav commented Jan 21, 2024 • edited Loading

eleftherias left a comment

Choose a reason for hiding this comment

Vyom-Yadav commented Jan 22, 2024 • edited Loading

evankanderson commented Jan 22, 2024

evankanderson commented Jan 22, 2024

evankanderson commented Jan 22, 2024

evankanderson commented Jan 22, 2024

Vyom-Yadav commented Jan 22, 2024 • edited Loading

Vyom-Yadav commented Jan 22, 2024

Vyom-Yadav commented Jan 22, 2024

evankanderson commented Jan 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evankanderson commented Jan 24, 2024

Vyom-Yadav commented Jan 25, 2024

Vyom-Yadav commented Jan 26, 2024

evankanderson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Vyom-Yadav Jan 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Vyom-Yadav commented Jan 30, 2024

evankanderson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evankanderson commented Jan 31, 2024

Vyom-Yadav commented Jan 31, 2024 • edited Loading

Vyom-Yadav commented Jan 31, 2024 • edited Loading

evankanderson left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Vyom-Yadav commented Jan 21, 2024 •

edited

Loading

Vyom-Yadav commented Jan 22, 2024 •

edited

Loading

Vyom-Yadav commented Jan 22, 2024 •

edited

Loading

Vyom-Yadav Jan 28, 2024 •

edited

Loading

Vyom-Yadav commented Jan 31, 2024 •

edited

Loading

Vyom-Yadav commented Jan 31, 2024 •

edited

Loading