refactor: effective tls policies reconciler #927

KevFan · 2024-10-09T17:59:19Z

Description

Refactor the logic for the management of TLSPolicy Certificates into a workflow task
Re-add validation for TLS Policy to check for if Issuer kind is correct and if the issuer is present on cluster, with the addition of integrations tests to test for this

Verification

Passing integration tests should be enough to test that nothing is broken from this change

KevFan · 2024-10-11T14:23:53Z

controllers/effective_tls_policies_reconciler.go

+		if policy.DeletionTimestamp != nil {
+			logger.V(1).Info("policy is marked for deletion, nothing to do", "name", policy.Name, "namespace", policy.Namespace, "uid", policy.GetUID())
+			continue
+		}


No need for finalizer since associated Certificate's will also be deleted by owner ref

controllers/effective_tls_policies_reconciler.go

controllers/tls_workflow.go

controllers/effective_tls_policies_reconciler.go

Boomatang

Some of my initial thoughts. Have not ran the code base yet.

Boomatang · 2024-10-18T14:01:36Z

controllers/effective_tls_policies_reconciler.go

+		})
+
+		// Policy is deleted
+		if policy.DeletionTimestamp != nil {


Would it not make more sense to do the deletion timestamp filtering when getting the list of policies in the first place.

I've reverted the filtering logic in the latest commit and started looping through the attached policies of listeners/gateways instead. I've kept this kind of check but not sure is there any chance that an deleted policy would still be attached to the listener or gateway 🤔

Boomatang · 2024-10-18T14:07:33Z

controllers/effective_tls_policies_reconciler.go

+				})
+
+				// Create
+				if !ok {


Can we be more positive in our checks, doing if ok instead of `if !ok. I also wonder if it is better to have the more common action listed first. So if we do more creates that should be listed first but if we do more creates that should be.

Sure but whether to do !ok or ok first depends also on whether we create or update first, so if we want to be more positive in our check here if ok, this would mean update is first 🤔

Not sure which action do we expect more of. Maybe @mikenairn can answer on this

I wouldn't expect too many updates of a Certificate resource after creation. I don't have strong opinions on this, but for what its worth i did if ok in the dnspolicy equivalent task, and deal with updates first.

if ok { update } / if !ok { create }. Does it matter? The number of jump instructions in the final compiled code is literally the same I think.

If !ok feels like being negative (it doesn't to me BTW), then perhaps we could rename the variable to missing. Is it better? I.e. if missing { create }.

IMO, if it doesn't have any obvious downside in performance, the important thing is that the code is readable. In my head (and it could be only me), I read this in terms of chronology and progressive use cases.

The first thing that happens is having nothing, then:

user creates the network resources

user creates the policies

the operator creates the internal resources (for the first time after seeing the policies)

user adjusts the policies and/or network resources

operator updates the internal resources accordingly.

Thanks guys 👍 I don't think it matters in fairness unless people really feel it hinders readability. My preference is to keep it as is tbh as I also prefer dealing with the create first followed by update action. If we feel strongly about this, maybe this can be done in a separate where this can be dealt with in the entire codebase

Boomatang · 2024-10-18T14:18:17Z

controllers/effective_tls_policies_reconciler.go

+		}
+	}
+
+	// Clean up orphaned certs


The cleaning up of orphaned certs gives me an odd feeling, which I have yet to put my finger on. Had the same feeling with the DNS records. The question I ask find asking my self is what does the cleaning up of orphaned certs have to do with TLS polices and their reconcile. I have this feeling that this clean up should be structured as a Task in the Workflow over being part of the TLS policy reconcile. There seems to be a mixing of concerns. I understand how this mixing of concerns is ingrained in us from using the controller runtime. This currently a feeling that this is a code smell, but yet I don't have a good though on what to do about it.

One thing that I don't know at time of writing is how do certs become orphaned.

I believe this is because there is a indirect link between policies and the subresouce (certificates in this case) and so gives the expected state of the cluster. Though true, this probably can be done as a separate task, that I can look into if you feel strongly about this.

A cert can be orphaned when a gateway listener is removed, or is the target ref of a tls policy is changed to another gateway. In both of these cases the cert(s) is created and will be orphaned since the policy still exists and still valid

Indeed we need to cleanup certs that remained orphan due to, e.g., a listener being removed from a gateway ceteris paribus.

Let's keep this as-is now since it does the job. Maybe in the future we can look for this pattern across the reconcilers (not only this one) and try to improve them all by having "janitors" as separate tasks that can run in parallel to other tasks.

I am happy with leaving this for now. It is some thing that can be done in the future.

controllers/effective_tls_policies_reconciler.go

controllers/tls_workflow.go

controllers/tlspolicy_status_updater.go

Signed-off-by: KevFan <[email protected]>

…get ref Signed-off-by: KevFan <[email protected]>

…rts" This reverts commit 552d9f3. Signed-off-by: KevFan <[email protected]>

Signed-off-by: KevFan <[email protected]>

guicassolato · 2024-10-21T08:32:55Z

controllers/effective_tls_policies_reconciler.go

+		policies := l.Policies()
+		if len(policies) == 0 {
+			policies = l.Gateway.Policies()
+		}


Nice work, @KevFan. This will easily adapt to when we decide to support sectionName and effectively inheritance on TLSPolicies.

Is it good enough to just assume if you have no policies, use the gateway ones? Should it not check for a policy of the expected type being attached to the listener and if none fallback? Will this not also reconcile a gateway policy multiple times, once per listener?

Yes, that's true, this should check for policies of specific type before failing back also 👍
Yes, unfortunately, this would reconcile a gateway policy multiple times. Not quite sure how to approach this one 🤔

Updated to filter for tls policies. I'm aware of the multiple reconciles for gateway policies but I think I'll look to address this as I look into the reconcile for section name support in another PR.

controllers/effective_tls_policies_reconciler.go

Signed-off-by: KevFan <[email protected]>

Boomatang

I am happy to approve this. Any comment I had was based around styling which can be done later.

KevFan self-assigned this Oct 11, 2024

KevFan added the kind/enhancement New feature or request label Oct 11, 2024

KevFan commented Oct 11, 2024

View reviewed changes

maleck13 reviewed Oct 15, 2024

View reviewed changes

controllers/effective_tls_policies_reconciler.go Outdated Show resolved Hide resolved

KevFan force-pushed the effective-tls-reconciler branch 5 times, most recently from af8c44c to cd35907 Compare October 16, 2024 09:21

KevFan marked this pull request as ready for review October 16, 2024 09:24

KevFan requested review from guicassolato, Boomatang and mikenairn October 16, 2024 11:45

guicassolato reviewed Oct 16, 2024

View reviewed changes

controllers/tls_workflow.go Outdated Show resolved Hide resolved