-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add the protobuf message and the selector API #3797
Conversation
proto/minder/v1/minder.proto
Outdated
// the name of the entity, same as the name in the entity message | ||
string name = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the name of the entity globally unique, or do we need a provider to scope the name (or do we not care about uniqueness guarantees here)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to add the provider name as a separate attribute alongside name when we add fetching of the key-value pairs from the provider. Since you suggested to just use full_name were you thinking to make the provider name part of the full name as well?
I like how unambiguous that would be, but if we don't provide a very easy way "just ignore this repo" especially from the UI then the grammar gets less useful to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we might want one property for the full_name
, and one property for the provider
. That would allow for full distinguishing, but would default to names that were recognizable to people without some sort of special prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might also simplify "apply this policy to all billing/fooserver
images at all registries under Minder automation"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a provider message to the messages is that what you were describing? There's name and provider class exposed for start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
question (non-blocking): is there a particular reason why we're not calculating an exact, possibly unambiguous name and storing it alongside the entity? It really feels like we should have for repositories something like a URI/purl alongside repo name, owner, and provider, something like <provider>:<type>:<owner>:<name>
, e.g. github-app-foo:repository:stacklok:minder
. On top of that, we can define custom functions to ship with the CEL evaluator that operate on top of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do this as well and Evan suggested something similar. Good idea, thanks.
thanks for the reviews @evankanderson @blkt . I'm going to resolve the comments that I think are resolved (but feel free to unresolve if you don't think they are fully addressed) and I'm going to comments on parts of the code where I'm unsure that I'm going in the right direction. |
string name = 1; | ||
// the class of the provider, e.g. github-app | ||
string class = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This message is new in this PR version. It's just a way to avoid repeating what attributes we want to expose for provider.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this both in SelectorRepository
/SelectorArtifact
and in SelectorEntity
? It feels like both copies of the message should have the same contents, so maybe we should have a canonical place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the example here. Since the individual selectors can either be targetting a specific entity or any entity (but what is being evaluated is always an specific entity), the canonical place is the entity-specific message (e.g. SelectorRepository
). What's stored in the generic message (SelectorEntity
) is just a copy of the common data.
Is that confusing as a concept or would it help to make it crisper in the code in the sense that the provider methods that fill in the selector entity struct just fill in the entity-specific message and the generic attributes are copied by a common function?
//go:generate go run go.uber.org/mock/mockgen -package mock_$GOPACKAGE -destination=./mock/$GOFILE -source=./$GOFILE | ||
|
||
// RepoSelectorConverter is an interface for converting a repository to a repository selector | ||
type RepoSelectorConverter interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL here. This is a provider-private interface, meaning it's not part of the public provider interface in pkg/
but it's in internal
- the messages are stored in the private proto file as well.
This hopefully transfers the onus of creating the SelectorEntity
message from the selector interface to the providers where the user of the selector module (typically the evaluator engine) just calls the provider instance to retrieve a SelectorEntity
for this entity from its provider.
In a follow-up, I'd like to also add the key-value map here, but this PR uses just the attributes in the SelectorEntity
message and the per-entity nested messages.
) | ||
|
||
// entityInfoConverter is an interface for converting an entity from an EntityInfoWrapper to a SelectorEntity | ||
type entityInfoConverter interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just noticed the interface is misnamed, it's no longer tied to EntityInfoWrapper
, but just a pair an entity as a proto.message
and entity type.
converters: map[minderv1.Entity]entityInfoConverter{ | ||
minderv1.Entity_ENTITY_REPOSITORIES: newRepositoryInfoConverter(provider), | ||
minderv1.Entity_ENTITY_ARTIFACTS: newArtifactInfoConverter(provider), | ||
}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@puerco this is one of the things I don't like here, sure the code doesn't use a straight-up switch-case but a factory which is somewhat nicer, but we still hardcode the entities we know about..
I don't know if this is a good direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a bunch of places where we switch over entity types. We might implement some sort of registration mechanism alla init()
, something like this.
package custom
import (
"github.com/stacklok/minder/internal/entities
)
init() {
err := entities.Register(customv1.Entity_MY_CUSTOM_ENTITY, &CustomEntity{})
if err != nil {
log.Fatalf("failed registering custom entity")
}
}
type CustomEntity struct {
}
func (c *CustomEntity) Converter(...) error {
return nil
}
The nice thing about this is that this could be dynamically loaded as a plugin, or be otherwise loaded in the system.
Could this be a decent tool for this purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is pretty much along the lines of what we discussed yesterday on Slack with @puerco . I'm not sure if the implementation will be a plugin (perhaps we should go all the way and use RPC plugins), that's a discussion we should have separately, but I agree that a registry of entities would be nice.
@dmjb I'd appreciate your thoughts here as well, you usually have a good direction on how to split the interfaces nicely. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would help me a bit to see (in a comment) what this looks like wired in. I'm assuming that we'll have something like the following in executor.EvalEntityEvent
:
// entityProto is a SelectorEntity proto
entityProto, err := provider.GetEntityObject(inf)
// ...
err = e.forProjectsInHierarchy(
ctx, inf, func(ctx context.Context, profile *pb.Profile, hierarchy []uuid.UUID) error {
selector := selectors.SelectorForProfile(profile)
if !selector.Select(entityProto) {
// record as skipped
return nil // or SkippedErr
}
// Get only these rules that are relevant for this entity type
relevant, err := profiles.GetRulesForEntity(profile, inf.Type)
if err != nil {
return fmt.Errorf("error getting rules for entity: %w", err)
}
But, this formulation suggests that the Env
and SelectionBuilder
interfaces shouldn't need to take a minderv1.Entity
as a property, and should simply spit out a selector that works on generic SelectorEntity
protobufs.
internal/proto/internal.proto
Outdated
// one of repository, pull_request, artifact (see oneof entity) | ||
minder.v1.Entity entity_type = 1; | ||
// the name of the entity, same as the name in the entity message | ||
string name = 2; | ||
SelectorProvider provider = 3; | ||
|
||
oneof entity { | ||
SelectorRepository repository = 4; | ||
SelectorArtifact artifact = 5; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an internal implementation detail that currently makes it easier to implement CEL, right?
I'm concerned over time that encoding all the properties into a protocol buffer in core Minder will make it harder for new providers to add properties that make sense, but we can cross that bridge when we get there. (The alternative is that it's easy for providers to add new properties, but little uniformity across providers -- this pushes us on the higher-friction-but-more-commonality path, which might even be right.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an internal implementation detail that currently makes it easier to implement CEL, right?
yes
I'm concerned over time that encoding all the properties into a protocol buffer in core Minder will make it harder for new providers to add properties that make sense, but we can cross that bridge when we get there. (The alternative is that it's easy for providers to add new properties, but little uniformity across providers -- this pushes us on the higher-friction-but-more-commonality path, which might even be right.)
Right, I hear you. The properties that are expressed as attributes are only those that those entities will most commonly have (names, is_fork/is_private and those are optional bool, too). I expect any provider-specific attributes to be just key-value pairs in a map.
We could even go with just a map, but the nice thing about passing proto messages to CEL is that you CEL can check if the attributes exist or warn about mismatch between entity types and the attribute. I think it would be nice to surface these to the user when they create the selectors.
We won't have those for a generic map.
string name = 1; | ||
// the class of the provider, e.g. github-app | ||
string class = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this both in SelectorRepository
/SelectorArtifact
and in SelectorEntity
? It feels like both copies of the message should have the same contents, so maybe we should have a canonical place?
// not applicable to this provider | ||
optional bool is_fork = 3; | ||
// is_private is true if the repository is private, nil if "don't know" or rather | ||
// not applicable to this provider | ||
optional bool is_private = 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to include the "Custom properties" key/values that GitHub supports under a name like "tags", "labels", or "properties"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I wanted to do that in a follow-up but perhaps it makes sense to include the k-v property already to make the review easier in the sense that we'd see the whole message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the properties to the message, filling them in would be done in a follow-up
string name = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is to allow for access as both name
and repository.name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I should have spelled this out more prominently in the design doc. There's some examples in the unit tests
and here's a simple inline example. Given these selectors:
- comment: This should not apply to bad-go
entity: repository
selector: repository.name != 'jakubtestorg/bad-go'
- comment: Skip any entity that contains demo in the name
selector: entity.name.contains('demo') == false
The first one applies to repositories only, in this case the selector uses repository.name
, but it could use also repository.provider.name/class
, repository.is_fork
or repository.is_private
.
The second one applies to any entity type. It can use only attributes from the SelectorEntity
message which currently are entity.name
, entity.provider.name
and entity.provider.class
. (Let's talk about adding the purl or global name separately).
In both the entity-generic message and the entity-specific message we'll also add the k-v map.
|
||
// ArtifactToSelectorEntity converts an Artifact to a SelectorEntity | ||
func (_ *GitHub) ArtifactToSelectorEntity(_ context.Context, a *minderv1.Artifact) *internalpb.SelectorEntity { | ||
fullName := fmt.Sprintf("%s/%s", a.GetOwner(), a.GetName()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this include ghcr.io/
? (I'm not sure)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would say that not the name attribute, but I agree with both you and @blkt (in this comment) that we should have a global name as well.
I think just based on what users would expect in the simplest case we could have:
.name
- for repos this would beorgname/reponame
.full_name
- this could be a clone name perhaps. For repos,https://github.com/org/repo
for artifactsregistry_name/namespace/artifact_name
.id
(help me come with better name :-)) - a minder-specific identifier. I liked @blkt's suggestion
btw note that as long as we expose the individual attributes the user can also combine them with logical operations in the CEL expression so we don't have to go overboard with providing too many one-off attributes.
string name = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want a server
component here, e.g. ghcr.io
or eu.gcr.io
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, good point. I'm not sure if we store them for artifacts (or for repos for that matter, we do have a clone_url for repos), but I like the idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't that be part of the provider information?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can add that iteratively in another PR.
@evankanderson I keep a branch with patches that will be sent later. The wiring looks like this in executor: // this could be a private function for now and we could just select an inf
selectors, err := e.selBuilder.NewSelectionFromProfile(inf.Type, profile.Selection)
if err != nil {
return fmt.Errorf("error creating selectors: %w", err)
}
selected, err := selectors.Select(provsel.EntityToSelectorEntity(ctx, provider, inf.Type, inf.Entity))
if err != nil {
return fmt.Errorf("error selecting entity: %w", err)
} |
Why does extracting the entity properties need to know the profile? Naively, I'd expect that we could extract the entity properties once per evaluation, not once per profile in the evaluation. |
(not 100% sure I'm answering your question without more context) it doesn't, we can call the extraction earlier. We should also only build the entity properties (the selector message) once we hit the first profile with non-zero selectors because extracting the entity properties might involve calling to the provider (e.g. to get the key-value pairs which we don't store in the database) which might be expensive. |
b6e2cfc
to
fdf1ff6
Compare
Adds a new module called selectors that initializes CEL environments and evaluates selectors. Fixes: mindersec#3757
Summary
Define the protobuf API for selectors - Creates a new protobuf message SelectorEntity that represents a set of attributes for a typed entity to be used in profile selectors.
At the moment, onyl SelectorRepository and SelectorArtifact are used and implemented. Additionally, the key-value store is not implemented either as it requires a new set of provider interfaces to fill this store.
Add the selector evalautor - Adds a new module called selectors that initializes CEL environments and evaluates selector
Fixes #3757
Change Type
Mark the type of change your PR introduces:
Testing
there are unit tests included. I also cherry-pick these from a larger branch that takes this code into account
Review Checklist: