Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

✨ Restrict cache for all ConfigMap/Secret objects #722

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

akutz
Copy link
Collaborator

@akutz akutz commented Sep 24, 2024

What does this PR do, and why is it needed?

This patch updates the manager cache for VM Operator so that ConfigMap/Secret resources from the kube-system and VM Op pod namespaces are cached, but ConfigMap/Secret resources in any other namespace are not cached.

This patch means controllers that access ConfigMap/Secret resources in these namespaces no longer need to create separate caches. Instead, all controllers may use the manager client unless for some other reason.

Which issue(s) is/are addressed by this PR? (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):

Fixes NA

Are there any special notes for your reviewer:

The tests in controllers/infra/secret/infra_secret_controller_intg_test.go related to checking the value of called validates the expected behavior.

Please add a release note if necessary:

Restrict caching of ConfigMap and Secret resources to kube-system and the VM Operator pod namespaces.

This patch updates the manager cache for VM Operator so that
ConfigMap/Secret resources from the kube-system and VM Op pod
namespaces are cached, but ConfigMap/Secret resources in any
other namespace are *not* cached.

This patch means controllers that access ConfigMap/Secret
resources in these namespaces no longer need to create separate
caches. Instead, all controllers may use the manager client unless
for some other reason.
@akutz akutz requested a review from bryanv September 24, 2024 15:08
@github-actions github-actions bot added the size/L Denotes a PR that changes 100-499 lines. label Sep 24, 2024
@akutz
Copy link
Collaborator Author

akutz commented Sep 24, 2024

@sbueringer Ugh. This is failing with:

$ go test -timeout 120s -run '^TestManager$' ./pkg/manager/test
Running Suite: Manager Suite - /Users/akutz/Projects/vmop/vmop/pkg/manager/test
===============================================================================
Random Seed: 1727195109

Will run 3 of 3 specs
------------------------------
• [FAILED] [0.022 seconds]
Integration tests Cache Getting objects ConfigMap [It] should return the object with a live read [envtest]
/Users/akutz/Projects/vmop/vmop/pkg/manager/test/cache_test.go:57

  Timeline >>
  STEP: Creating a temporary namespace @ 09/24/24 11:25:15.684
  [FAILED] in [It] - /Users/akutz/Projects/vmop/vmop/pkg/manager/test/cache_test.go:58 @ 09/24/24 11:25:15.702
  STEP: Destroying temporary namespace @ 09/24/24 11:25:15.702
  << Timeline

  [FAILED] Expected success, but got an error:
      <*errors.errorString | 0x14000e06050>: 
      unable to get: 2c57ba0a-9e3c-44eb-8721-b68a1bf4f65c/my-object-7kbwc because of unknown namespace for the cache
      {
          s: "unable to get: 2c57ba0a-9e3c-44eb-8721-b68a1bf4f65c/my-object-7kbwc because of unknown namespace for the cache",
      }
  In [It] at: /Users/akutz/Projects/vmop/vmop/pkg/manager/test/cache_test.go:58 @ 09/24/24 11:25:15.702
------------------------------
SSI0924 11:25:15.707798   99028 internal.go:538] "Stopping and waiting for non leader election runnables"
I0924 11:25:15.707837   99028 internal.go:542] "Stopping and waiting for leader election runnables"
I0924 11:25:15.707857   99028 internal.go:550] "Stopping and waiting for caches"
I0924 11:25:15.707954   99028 internal.go:554] "Stopping and waiting for webhooks"
I0924 11:25:15.707967   99028 internal.go:557] "Stopping and waiting for HTTP servers"
I0924 11:25:15.707977   99028 internal.go:561] "Wait completed, proceeding to shutdown the manager"


Summarizing 1 Failure:
  [FAIL] Integration tests Cache Getting objects ConfigMap [It] should return the object with a live read [envtest]
  /Users/akutz/Projects/vmop/vmop/pkg/manager/test/cache_test.go:58

Ran 1 of 3 Specs in 7.299 seconds
FAIL! -- 0 Passed | 1 Failed | 0 Pending | 2 Skipped
--- FAIL: TestManager (7.30s)
FAIL
FAIL	github.com/vmware-tanzu/vm-operator/pkg/manager/test	8.053s
FAIL

It seems that controller-runtime does not fall-through to the API reader / bypass cache if any namespace mappings are present for an object.

Is there not any way we can utilize ByObject or Disable to:

  • Disable caching for ConfigMap and Secret resources for all namespaces except those we bless.
  • Do the above using the default manager client/cache.
  • Not need to set up special caches for controllers that need to watch/cache ConfigMap/Secret resources in the allowed namespaces.

Thanks!

@sbueringer
Copy link

sbueringer commented Sep 25, 2024

It seems that controller-runtime does not fall-through to the API reader / bypass cache if any namespace mappings are present for an object.

That is correct.

Some context:

  • Via cache.Options you can configure what exactly should be cached (including which namespaces for which gvk etc.)
  • Via client.Options you can configure for which requests the client should use the cache vs live clients:
  • Once the client decided to either use the cache or the live clients this choice is final. There is no "fallback" e.g. if the Cache returns a NotFound. I assume this is because you don't want to end up spaming the apiserver when an object actually doesn't exist

Potential solutions

  • I think your cache config is correct and there is no way to configure it better. It's definitely not needed to set up separate caches for ConfigMaps and Secrets.
  • I think the key is in solving this at the "client-level". A few options:
    1. Set DisableFor for the mgr.GetClient() so the mgr.GetClient() doesn't use the cache for ConfigMaps and Secrets. Create an additional client for the cases where you want to cache. We did this here in CAPI: https://github.com/kubernetes-sigs/cluster-api/blob/3232abcf390f5364d4ebfc4ce7c2b5fac0281686/main.go#L389-L398
    2. The reverse, mgr.GetClient() uses the Cache for all ConfigMaps/Secrets. Then you have to explicitly use a live client (e.g. mgr.GetAPIReader()) in the cases where you want to use a live client for ConfigMaps/Secrets
    3. Build a Client that internally decides in which cases a live client or the cache should be used. This would be probably a thin wrapper around mgr.GetClient(). One easy way to do this is to use interceptor.NewClient, overwrite Get & List and add some if/else based on namespace.
    4. Ideal mid-term option would be to extend client.CacheOptions.DisableFor to allow more fine-granular configuration when the cache should be used

@akutz
Copy link
Collaborator Author

akutz commented Sep 25, 2024

First of all, thank you @sbueringer!

  • I think the key is in solving this at the "client-level". A few options:

    1. Set DisableFor for the mgr.GetClient() so the mgr.GetClient() doesn't use the cache for ConfigMaps and Secrets. Create an additional client for the cases where you want to cache. We did this here in CAPI: https://github.com/kubernetes-sigs/cluster-api/blob/3232abcf390f5364d4ebfc4ce7c2b5fac0281686/main.go#L389-L398
    2. The reverse, mgr.GetClient() uses the Cache for all ConfigMaps/Secrets. Then you have to explicitly use a live client (e.g. mgr.GetAPIReader()) in the cases where you want to use a live client for ConfigMaps/Secrets
    3. Build a Client that internally decides in which cases a live client or the cache should be used. This would be probably a thin wrapper around mgr.GetClient(). One easy way to do this is to use interceptor.NewClient, overwrite Get & List and add some if/else based on namespace.
    4. Ideal mid-term option would be to extend client.CacheOptions.DisableFor to allow more fine-granular configuration when the cache should be used

Ironically, this is exactly what we used to do prior to CR introducing the DisableFor feature. We had a client proxy that decided which resources used the cache versus uses the reader.

Okay, I guess we need to go back to that then. Thanks again @sbueringer !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-not-required size/L Denotes a PR that changes 100-499 lines.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants