Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add the protobuf message and the selector API #3797

Merged
merged 5 commits into from
Jul 19, 2024

Conversation

jhrozek
Copy link
Contributor

@jhrozek jhrozek commented Jul 5, 2024

Summary

Define the protobuf API for selectors - Creates a new protobuf message SelectorEntity that represents a set of attributes for a typed entity to be used in profile selectors.

At the moment, onyl SelectorRepository and SelectorArtifact are used and implemented. Additionally, the key-value store is not implemented either as it requires a new set of provider interfaces to fill this store.

Add the selector evalautor - Adds a new module called selectors that initializes CEL environments and evaluates selector

Fixes #3757

Change Type

Mark the type of change your PR introduces:

  • Bug fix (resolves an issue without affecting existing features)
  • Feature (adds new functionality without breaking changes)
  • Breaking change (may impact existing functionalities or require documentation updates)
  • Documentation (updates or additions to documentation)
  • Refactoring or test improvements (no bug fixes or new functionality)

Testing

there are unit tests included. I also cherry-pick these from a larger branch that takes this code into account

Review Checklist:

  • Reviewed my own code for quality and clarity.
  • Added comments to complex or tricky code sections.
  • Updated any affected documentation.
  • Included tests that validate the fix or feature.
  • Checked that related changes are merged.

@jhrozek jhrozek requested a review from a team as a code owner July 5, 2024 19:28
@coveralls
Copy link

Coverage Status

coverage: 52.505% (+0.3%) from 52.204%
when pulling 2aeb88c on jhrozek:selectors_api
into 6274fe5 on stacklok:main.

proto/minder/v1/minder.proto Outdated Show resolved Hide resolved
internal/engine/selectors/selectors.go Outdated Show resolved Hide resolved
internal/engine/selectors/selectors.go Outdated Show resolved Hide resolved
internal/engine/selectors/selectors.go Outdated Show resolved Hide resolved
internal/engine/selectors/selectors.go Outdated Show resolved Hide resolved
Comment on lines 2633 to 2634
// the name of the entity, same as the name in the entity message
string name = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the name of the entity globally unique, or do we need a provider to scope the name (or do we not care about uniqueness guarantees here)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to add the provider name as a separate attribute alongside name when we add fetching of the key-value pairs from the provider. Since you suggested to just use full_name were you thinking to make the provider name part of the full name as well?

I like how unambiguous that would be, but if we don't provide a very easy way "just ignore this repo" especially from the UI then the grammar gets less useful to the user.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might want one property for the full_name, and one property for the provider. That would allow for full distinguishing, but would default to names that were recognizable to people without some sort of special prefix.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might also simplify "apply this policy to all billing/fooserver images at all registries under Minder automation"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a provider message to the messages is that what you were describing? There's name and provider class exposed for start.

Copy link
Contributor

@blkt blkt Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question (non-blocking): is there a particular reason why we're not calculating an exact, possibly unambiguous name and storing it alongside the entity? It really feels like we should have for repositories something like a URI/purl alongside repo name, owner, and provider, something like <provider>:<type>:<owner>:<name>, e.g. github-app-foo:repository:stacklok:minder. On top of that, we can define custom functions to ship with the CEL evaluator that operate on top of that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do this as well and Evan suggested something similar. Good idea, thanks.

proto/minder/v1/minder.proto Outdated Show resolved Hide resolved
proto/minder/v1/minder.proto Outdated Show resolved Hide resolved
proto/minder/v1/minder.proto Outdated Show resolved Hide resolved
internal/engine/selectors/selectors.go Outdated Show resolved Hide resolved
internal/engine/selectors/selectors.go Outdated Show resolved Hide resolved
internal/engine/selectors/selectors.go Outdated Show resolved Hide resolved
@jhrozek
Copy link
Contributor Author

jhrozek commented Jul 11, 2024

thanks for the reviews @evankanderson @blkt . I'm going to resolve the comments that I think are resolved (but feel free to unresolve if you don't think they are fully addressed) and I'm going to comments on parts of the code where I'm unsure that I'm going in the right direction.

@coveralls
Copy link

coveralls commented Jul 11, 2024

Coverage Status

coverage: 54.175% (+0.2%) from 53.955%
when pulling fdf1ff6 on jhrozek:selectors_api
into 079001e on stacklok:main.

string name = 1;
// the class of the provider, e.g. github-app
string class = 2;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This message is new in this PR version. It's just a way to avoid repeating what attributes we want to expose for provider.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this both in SelectorRepository/SelectorArtifact and in SelectorEntity? It feels like both copies of the message should have the same contents, so maybe we should have a canonical place?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the example here. Since the individual selectors can either be targetting a specific entity or any entity (but what is being evaluated is always an specific entity), the canonical place is the entity-specific message (e.g. SelectorRepository). What's stored in the generic message (SelectorEntity) is just a copy of the common data.

Is that confusing as a concept or would it help to make it crisper in the code in the sense that the provider methods that fill in the selector entity struct just fill in the entity-specific message and the generic attributes are copied by a common function?

//go:generate go run go.uber.org/mock/mockgen -package mock_$GOPACKAGE -destination=./mock/$GOFILE -source=./$GOFILE

// RepoSelectorConverter is an interface for converting a repository to a repository selector
type RepoSelectorConverter interface {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PTAL here. This is a provider-private interface, meaning it's not part of the public provider interface in pkg/ but it's in internal - the messages are stored in the private proto file as well.

This hopefully transfers the onus of creating the SelectorEntity message from the selector interface to the providers where the user of the selector module (typically the evaluator engine) just calls the provider instance to retrieve a SelectorEntity for this entity from its provider.

In a follow-up, I'd like to also add the key-value map here, but this PR uses just the attributes in the SelectorEntity message and the per-entity nested messages.

)

// entityInfoConverter is an interface for converting an entity from an EntityInfoWrapper to a SelectorEntity
type entityInfoConverter interface {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just noticed the interface is misnamed, it's no longer tied to EntityInfoWrapper, but just a pair an entity as a proto.message and entity type.

converters: map[minderv1.Entity]entityInfoConverter{
minderv1.Entity_ENTITY_REPOSITORIES: newRepositoryInfoConverter(provider),
minderv1.Entity_ENTITY_ARTIFACTS: newArtifactInfoConverter(provider),
},
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@puerco this is one of the things I don't like here, sure the code doesn't use a straight-up switch-case but a factory which is somewhat nicer, but we still hardcode the entities we know about..
I don't know if this is a good direction.

Copy link
Contributor

@blkt blkt Jul 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have a bunch of places where we switch over entity types. We might implement some sort of registration mechanism alla init(), something like this.

package custom

import (
        "github.com/stacklok/minder/internal/entities
)

init() {
        err := entities.Register(customv1.Entity_MY_CUSTOM_ENTITY, &CustomEntity{})
        if err != nil {
                log.Fatalf("failed registering custom entity")
        }
}

type CustomEntity struct {
}

func (c *CustomEntity) Converter(...) error {
        return nil
}

The nice thing about this is that this could be dynamically loaded as a plugin, or be otherwise loaded in the system.
Could this be a decent tool for this purpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is pretty much along the lines of what we discussed yesterday on Slack with @puerco . I'm not sure if the implementation will be a plugin (perhaps we should go all the way and use RPC plugins), that's a discussion we should have separately, but I agree that a registry of entities would be nice.

@jhrozek
Copy link
Contributor Author

jhrozek commented Jul 11, 2024

@dmjb I'd appreciate your thoughts here as well, you usually have a good direction on how to split the interfaces nicely.

Copy link
Member

@evankanderson evankanderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would help me a bit to see (in a comment) what this looks like wired in. I'm assuming that we'll have something like the following in executor.EvalEntityEvent:

	// entityProto is a SelectorEntity proto
	entityProto, err := provider.GetEntityObject(inf)
	// ...
	err = e.forProjectsInHierarchy(
		ctx, inf, func(ctx context.Context, profile *pb.Profile, hierarchy []uuid.UUID) error {
			selector := selectors.SelectorForProfile(profile)
			if !selector.Select(entityProto) {
				// record as skipped
				return nil  // or SkippedErr
			}
			// Get only these rules that are relevant for this entity type
			relevant, err := profiles.GetRulesForEntity(profile, inf.Type)
			if err != nil {
				return fmt.Errorf("error getting rules for entity: %w", err)
			}

But, this formulation suggests that the Env and SelectionBuilder interfaces shouldn't need to take a minderv1.Entity as a property, and should simply spit out a selector that works on generic SelectorEntity protobufs.

Comment on lines 103 to 120
// one of repository, pull_request, artifact (see oneof entity)
minder.v1.Entity entity_type = 1;
// the name of the entity, same as the name in the entity message
string name = 2;
SelectorProvider provider = 3;

oneof entity {
SelectorRepository repository = 4;
SelectorArtifact artifact = 5;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an internal implementation detail that currently makes it easier to implement CEL, right?

I'm concerned over time that encoding all the properties into a protocol buffer in core Minder will make it harder for new providers to add properties that make sense, but we can cross that bridge when we get there. (The alternative is that it's easy for providers to add new properties, but little uniformity across providers -- this pushes us on the higher-friction-but-more-commonality path, which might even be right.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an internal implementation detail that currently makes it easier to implement CEL, right?

yes

I'm concerned over time that encoding all the properties into a protocol buffer in core Minder will make it harder for new providers to add properties that make sense, but we can cross that bridge when we get there. (The alternative is that it's easy for providers to add new properties, but little uniformity across providers -- this pushes us on the higher-friction-but-more-commonality path, which might even be right.)

Right, I hear you. The properties that are expressed as attributes are only those that those entities will most commonly have (names, is_fork/is_private and those are optional bool, too). I expect any provider-specific attributes to be just key-value pairs in a map.

We could even go with just a map, but the nice thing about passing proto messages to CEL is that you CEL can check if the attributes exist or warn about mismatch between entity types and the attribute. I think it would be nice to surface these to the user when they create the selectors.

We won't have those for a generic map.

string name = 1;
// the class of the provider, e.g. github-app
string class = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this both in SelectorRepository/SelectorArtifact and in SelectorEntity? It feels like both copies of the message should have the same contents, so maybe we should have a canonical place?

Comment on lines +85 to +91
// not applicable to this provider
optional bool is_fork = 3;
// is_private is true if the repository is private, nil if "don't know" or rather
// not applicable to this provider
optional bool is_private = 4;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to include the "Custom properties" key/values that GitHub supports under a name like "tags", "labels", or "properties"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I wanted to do that in a follow-up but perhaps it makes sense to include the k-v property already to make the review easier in the sense that we'd see the whole message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the properties to the message, filling them in would be done in a follow-up

Comment on lines +106 to +112
string name = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is to allow for access as both name and repository.name?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should have spelled this out more prominently in the design doc. There's some examples in the unit tests

and here's a simple inline example. Given these selectors:

  - comment: This should not apply to bad-go
    entity: repository
    selector: repository.name != 'jakubtestorg/bad-go'
  - comment: Skip any entity that contains demo in the name
    selector: entity.name.contains('demo') == false

The first one applies to repositories only, in this case the selector uses repository.name, but it could use also repository.provider.name/class, repository.is_fork or repository.is_private.

The second one applies to any entity type. It can use only attributes from the SelectorEntity message which currently are entity.name, entity.provider.name and entity.provider.class. (Let's talk about adding the purl or global name separately).

In both the entity-generic message and the entity-specific message we'll also add the k-v map.


// ArtifactToSelectorEntity converts an Artifact to a SelectorEntity
func (_ *GitHub) ArtifactToSelectorEntity(_ context.Context, a *minderv1.Artifact) *internalpb.SelectorEntity {
fullName := fmt.Sprintf("%s/%s", a.GetOwner(), a.GetName())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this include ghcr.io/? (I'm not sure)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say that not the name attribute, but I agree with both you and @blkt (in this comment) that we should have a global name as well.

I think just based on what users would expect in the simplest case we could have:

  • .name - for repos this would be orgname/reponame
  • .full_name - this could be a clone name perhaps. For repos, https://github.com/org/repo for artifacts registry_name/namespace/artifact_name
  • .id (help me come with better name :-)) - a minder-specific identifier. I liked @blkt's suggestion

btw note that as long as we expose the individual attributes the user can also combine them with logical operations in the CEL expression so we don't have to go overboard with providing too many one-off attributes.

Comment on lines +94 to +98
string name = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want a server component here, e.g. ghcr.io or eu.gcr.io?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, good point. I'm not sure if we store them for artifacts (or for repos for that matter, we do have a clone_url for repos), but I like the idea.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't that be part of the provider information?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can add that iteratively in another PR.

@jhrozek
Copy link
Contributor Author

jhrozek commented Jul 12, 2024

It would help me a bit to see (in a comment) what this looks like wired in. I'm assuming that we'll have something like the following in executor.EvalEntityEvent:

	// entityProto is a SelectorEntity proto
	entityProto, err := provider.GetEntityObject(inf)
	// ...
	err = e.forProjectsInHierarchy(
		ctx, inf, func(ctx context.Context, profile *pb.Profile, hierarchy []uuid.UUID) error {
			selector := selectors.SelectorForProfile(profile)
			if !selector.Select(entityProto) {
				// record as skipped
				return nil  // or SkippedErr
			}
			// Get only these rules that are relevant for this entity type
			relevant, err := profiles.GetRulesForEntity(profile, inf.Type)
			if err != nil {
				return fmt.Errorf("error getting rules for entity: %w", err)
			}

But, this formulation suggests that the Env and SelectionBuilder interfaces shouldn't need to take a minderv1.Entity as a property, and should simply spit out a selector that works on generic SelectorEntity protobufs.

@evankanderson I keep a branch with patches that will be sent later. The wiring looks like this in executor:

			// this could be a private function for now and we could just select an inf
			selectors, err := e.selBuilder.NewSelectionFromProfile(inf.Type, profile.Selection)
			if err != nil {
				return fmt.Errorf("error creating selectors: %w", err)
			}
			selected, err := selectors.Select(provsel.EntityToSelectorEntity(ctx, provider, inf.Type, inf.Entity))
			if err != nil {
				return fmt.Errorf("error selecting entity: %w", err)
			}

@evankanderson
Copy link
Member

Why does extracting the entity properties need to know the profile? Naively, I'd expect that we could extract the entity properties once per evaluation, not once per profile in the evaluation.

@jhrozek
Copy link
Contributor Author

jhrozek commented Jul 12, 2024

Why does extracting the entity properties need to know the profile? Naively, I'd expect that we could extract the entity properties once per evaluation, not once per profile in the evaluation.

(not 100% sure I'm answering your question without more context)

it doesn't, we can call the extraction earlier. We should also only build the entity properties (the selector message) once we hit the first profile with non-zero selectors because extracting the entity properties might involve calling to the provider (e.g. to get the key-value pairs which we don't store in the database) which might be expensive.

@jhrozek jhrozek force-pushed the selectors_api branch 2 times, most recently from b6e2cfc to fdf1ff6 Compare July 18, 2024 19:55
@jhrozek jhrozek merged commit 8d7f92f into mindersec:main Jul 19, 2024
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Define the selector grammar and the associated protobuf messages
5 participants