Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Make DataStore Schema (prefix) Configurable #554

Merged

Conversation

SimonKienzler
Copy link
Contributor

Fixes #548

As agreed in the aforementioned discussion, this PR introduces a new TenantControlPlaneSpec field called DataStoreSchema. The field name could also be different, if desired. It allows users to control the database name or etcd prefix. If the field is unset during creation, the defaulting webhook sets the field to <TCPNamespace>_<TCPName>, which is the current behavior. The field is immutable.

Some remarks:

  • I think I found bugs in the current behavior of the defaulting webhook:
    • it never appended to the operations in the OnCreate function, therefore the replicas were never defaulted.
    • the OnUpdate function also had some early returns, so same problem there.
  • Instead of using a webhook, I propose to use kubebuilder markers with CEL to ensure that the DataStore and DataStoreSchema fields can't be unset after creation. This makes the OnUpdate function a NoOp.
  • Added some test cases for the defaulting webhook.
  • Updated the doc string for the DataStore field. According to the docs, datastore migrations are in fact possible.

Copy link

netlify bot commented Aug 21, 2024

Deploy Preview for kamaji-documentation ready!

Name Link
🔨 Latest commit bbadc70
🔍 Latest deploy log https://app.netlify.com/sites/kamaji-documentation/deploys/66f3dbd0e0c7300008adf163
😎 Deploy Preview https://deploy-preview-554--kamaji-documentation.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@@ -49,21 +43,21 @@ func (t TenantControlPlaneDefaults) OnDelete(runtime.Object) AdmissionResponse {
}

func (t TenantControlPlaneDefaults) OnUpdate(object runtime.Object, oldObject runtime.Object) AdmissionResponse {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (t TenantControlPlaneDefaults) OnUpdate(object runtime.Object, oldObject runtime.Object) AdmissionResponse {
func (t TenantControlPlaneDefaults) OnUpdate(runtime.Object, runtime.Object) AdmissionResponse {

@@ -249,12 +249,20 @@ type AddonsSpec struct {
}

// TenantControlPlaneSpec defines the desired state of TenantControlPlane.
// +kubebuilder:validation:XValidation:rule="!has(oldSelf.dataStore) || has(self.dataStore)", message="unsetting the dataStore is not supported"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A CEL policy here would create a breaking change since it requires Kubernetes >= 1.25, and Kamaji has a loose requirement of +1.22 (https://kamaji.clastix.io/reference/versioning/)

@bsctl WDYT here?

Copy link
Member

@bsctl bsctl Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About version, frankly I don’t care. 1.22 is a very old release and we can move to 1.25 as baseline for support. If CEL usage provides benefits over webhook and if we want push in this direction for the future, I think it is a good idea, but I would avoid a mix and match of different techniques unless it provides clear benefits, just for sake of maintenance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my experience, it's convenient to handle as many validation/defaulting tasks as possible using the kubebuilder annotations, i.e. letting the K8s API Server handle that for you. Except for the things that need a more sophisticated logic, then resort to webhooks

@prometherion
Copy link
Member

  • it never appended to the operations in the OnCreate function, therefore the replicas were never defaulted.

Are you referring to changes introduced with 071e679? It seems fine, wondering why we never had a report, maybe everyone's hardly setting values.

@SimonKienzler
Copy link
Contributor Author

Regarding the failing E2E test: The Helm Chart (charts/kamaji) is used for installing Kamaji in the KinD cluster. Is there a make target or script to update the Helm Chart, which I'm missing?

@SimonKienzler
Copy link
Contributor Author

Another question that came to my mind: What would be the preferred way of handling existing TCP CRDs? Should the controller handle this silently (adopting the spec from the status)?

@prometherion
Copy link
Member

Reading from the status is the way to preserve consistency since it's our state machine declaration, and saved us from many issues.

Users will need to patch their TCP with the newer version of the CRD just to be sure to have a consistent status.

@prometherion
Copy link
Member

Is there a make target or script to update the Helm Chart, which I'm missing?

It depends on your dev environment, I'm used to developing with kind and installing Kamaji using the local Helm Chart, changing the ImagePullPolicy of the manager Deployment to Never and overriding the image by uploading it to the Kind instances.

Debugging from your IDE is much more complicated since it requires some hacking of the webhooks with tunnelling, as well as having Kamaji with a TLS certificate: I'm used to ngrok, and happy to provide further help, let's move to DM on the Kubernetes Slack workspace.

This also includes converting OnUpdate() to a no-op, as the
existence and immutability of the fields are already checked
by the API server, thanks to kubebuilder markers.

The webhook ensures that fields like dataStore, dataStoreSchema
are defaulted during creation (if unset), and the CEL expressions
prohibit unsetting them during update.
@SimonKienzler
Copy link
Contributor Author

SimonKienzler commented Sep 2, 2024

Okay, with the changes from #565 incorporated, the E2E test suite passed. Nice 🎉

Regarding the adoption of existing TCP resources, this is less complicated than I thought. As the status will always be the same pattern as what the webhook will default to (<namespace>_<name>), there's nothing special to do in terms of "adopting" resources created with the previous CRD version. We just need to make sure the webhook will just fill in the spec field on updates. we can adopt the dataStoreName from the status on first reconcilation after a controller update. So, open points are:

  1. I will add a check to the controller if the Status.Storage.Setup.Schema field is already set, and if yes, adopt it to the spec.
  2. Would you like me to contribute specific E2E test scenarios regarding the dataStoreSchema field?
  3. Regarding the discussion of 1.22 vs 1.25 as a minimum requirement, I understood that you'd be fine with Kamaji requiring Kubernetes 1.25. I will add a commit to this PR documenting this, unless you tell me otherwise.
  4. Anything else you want me to do before this change can be approved?

@prometherion
Copy link
Member

  • I will add a check to the controller if the Status.Storage.Setup.Schema field is already set, and if yes, adopt it to the spec.

Just reviewed and suggested some prioritizations.

  • Would you like me to contribute specific E2E test scenarios regarding the dataStoreSchema field?

That's enough, thanks.

  • Regarding the discussion of 1.22 vs 1.25 as a minimum requirement, I understood that you'd be fine with Kamaji requiring Kubernetes 1.25. I will add a commit to this PR documenting this, unless you tell me otherwise.

Let's implement the CEL language as you suggested, I'll open a PR once we have it merged to document this change.

  • Anything else you want me to do before this change can be approved?

LGTM!

Co-authored-by: Dario Tranchitella <[email protected]>
@@ -6415,8 +6415,18 @@ spec:
description: |-
DataStore allows to specify a DataStore that should be used to store the Kubernetes data for the given Tenant Control Plane.
This parameter is optional and acts as an override over the default one which is used by the Kamaji Operator.
Migration from a different DataStore to another one is not yet supported and the reconciliation will be blocked.
Migration from a different DataStore to another one is supported, see: https://kamaji.clastix.io/guides/datastore-migration/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is incorrect; rather, this sentence should explain we do not offer migration between Datastore backed by different drivers.

@prometherion prometherion merged commit 8b71843 into clastix:master Oct 2, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants