Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: effective tls policies reconciler #927

Merged
merged 15 commits into from
Oct 21, 2024

Conversation

KevFan
Copy link
Contributor

@KevFan KevFan commented Oct 9, 2024

Description

Part of #819 #824

  • Refactor the logic for the management of TLSPolicy Certificates into a workflow task
  • Re-add validation for TLS Policy to check for if Issuer kind is correct and if the issuer is present on cluster, with the addition of integrations tests to test for this

Verification

  • Passing integration tests should be enough to test that nothing is broken from this change

@KevFan KevFan self-assigned this Oct 11, 2024
@KevFan KevFan added the kind/enhancement New feature or request label Oct 11, 2024
Comment on lines 73 to 83
if policy.DeletionTimestamp != nil {
logger.V(1).Info("policy is marked for deletion, nothing to do", "name", policy.Name, "namespace", policy.Namespace, "uid", policy.GetUID())
continue
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for finalizer since associated Certificate's will also be deleted by owner ref

@KevFan KevFan force-pushed the effective-tls-reconciler branch 5 times, most recently from af8c44c to cd35907 Compare October 16, 2024 09:21
@KevFan KevFan marked this pull request as ready for review October 16, 2024 09:24
@KevFan KevFan force-pushed the effective-tls-reconciler branch 2 times, most recently from 2c1093b to bd7fc23 Compare October 18, 2024 10:58
Copy link
Contributor

@Boomatang Boomatang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of my initial thoughts. Have not ran the code base yet.

})

// Policy is deleted
if policy.DeletionTimestamp != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it not make more sense to do the deletion timestamp filtering when getting the list of policies in the first place.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reverted the filtering logic in the latest commit and started looping through the attached policies of listeners/gateways instead. I've kept this kind of check but not sure is there any chance that an deleted policy would still be attached to the listener or gateway 🤔

})

// Create
if !ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we be more positive in our checks, doing if ok instead of `if !ok. I also wonder if it is better to have the more common action listed first. So if we do more creates that should be listed first but if we do more creates that should be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure but whether to do !ok or ok first depends also on whether we create or update first, so if we want to be more positive in our check here if ok, this would mean update is first 🤔

Not sure which action do we expect more of. Maybe @mikenairn can answer on this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't expect too many updates of a Certificate resource after creation. I don't have strong opinions on this, but for what its worth i did if ok in the dnspolicy equivalent task, and deal with updates first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if ok { update } / if !ok { create }. Does it matter? The number of jump instructions in the final compiled code is literally the same I think.

If !ok feels like being negative (it doesn't to me BTW), then perhaps we could rename the variable to missing. Is it better? I.e. if missing { create }.

IMO, if it doesn't have any obvious downside in performance, the important thing is that the code is readable. In my head (and it could be only me), I read this in terms of chronology and progressive use cases.

The first thing that happens is having nothing, then:

  1. user creates the network resources
  2. user creates the policies
  3. the operator creates the internal resources (for the first time after seeing the policies)
  4. user adjusts the policies and/or network resources
  5. operator updates the internal resources accordingly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks guys 👍 I don't think it matters in fairness unless people really feel it hinders readability. My preference is to keep it as is tbh as I also prefer dealing with the create first followed by update action. If we feel strongly about this, maybe this can be done in a separate where this can be dealt with in the entire codebase

}
}

// Clean up orphaned certs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cleaning up of orphaned certs gives me an odd feeling, which I have yet to put my finger on. Had the same feeling with the DNS records. The question I ask find asking my self is what does the cleaning up of orphaned certs have to do with TLS polices and their reconcile. I have this feeling that this clean up should be structured as a Task in the Workflow over being part of the TLS policy reconcile. There seems to be a mixing of concerns. I understand how this mixing of concerns is ingrained in us from using the controller runtime. This currently a feeling that this is a code smell, but yet I don't have a good though on what to do about it.

One thing that I don't know at time of writing is how do certs become orphaned.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this is because there is a indirect link between policies and the subresouce (certificates in this case) and so gives the expected state of the cluster. Though true, this probably can be done as a separate task, that I can look into if you feel strongly about this.

A cert can be orphaned when a gateway listener is removed, or is the target ref of a tls policy is changed to another gateway. In both of these cases the cert(s) is created and will be orphaned since the policy still exists and still valid

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed we need to cleanup certs that remained orphan due to, e.g., a listener being removed from a gateway ceteris paribus.

Let's keep this as-is now since it does the job. Maybe in the future we can look for this pattern across the reconcilers (not only this one) and try to improve them all by having "janitors" as separate tasks that can run in parallel to other tasks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy with leaving this for now. It is some thing that can be done in the future.

controllers/effective_tls_policies_reconciler.go Outdated Show resolved Hide resolved
controllers/tls_workflow.go Outdated Show resolved Hide resolved
controllers/tlspolicy_status_updater.go Show resolved Hide resolved
@KevFan KevFan force-pushed the effective-tls-reconciler branch from d8d388d to ae206ee Compare October 21, 2024 06:44
Comment on lines 80 to 83
policies := l.Policies()
if len(policies) == 0 {
policies = l.Gateway.Policies()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, @KevFan. This will easily adapt to when we decide to support sectionName and effectively inheritance on TLSPolicies.

Copy link
Member

@mikenairn mikenairn Oct 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it good enough to just assume if you have no policies, use the gateway ones? Should it not check for a policy of the expected type being attached to the listener and if none fallback? Will this not also reconcile a gateway policy multiple times, once per listener?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's true, this should check for policies of specific type before failing back also 👍
Yes, unfortunately, this would reconcile a gateway policy multiple times. Not quite sure how to approach this one 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to filter for tls policies. I'm aware of the multiple reconciles for gateway policies but I think I'll look to address this as I look into the reconcile for section name support in another PR.

@KevFan KevFan force-pushed the effective-tls-reconciler branch from b177795 to c986a25 Compare October 21, 2024 10:49
Copy link
Contributor

@Boomatang Boomatang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am happy to approve this. Any comment I had was based around styling which can be done later.

@KevFan KevFan merged commit aba1a78 into Kuadrant:main Oct 21, 2024
24 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants