Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds watch-list implementation without breaking changes #1255

Merged
merged 2 commits into from
Aug 22, 2023

Conversation

casualjim
Copy link
Contributor

@casualjim casualjim commented Jul 17, 2023

Motivation

Implement support for watch-list.

We have 18000 namespaces in our cluster, we store custom resources where some collections have large schemas and the cardinality is also in the 100k's. For example, when a client doesn't use pagination our API server memory usage can balloon up to 2-300 GB under certain conditions

Solution

The most important goal is to reduce memory pressure on the API server. It would be nice to be able to reduce it on the client side too, in that case, we also care about knowing when the initial sync and or resync is in progress or at the very least completed.

Since this is for a new k8s version, I think it might be ok to change the watch event contract slightly. I've come up with a couple of approaches

Option 1: do nothing, still accumulate everything in a vec and emit Event::Restarted as the first event, when this streaming model is enabled. The name of the event is a bit weird, and it could maybe be renamed to express intent better although that would introduce breaking changes
Option 2: Modify the applied event to have an option or an enum value to indicate if we're still in the initial listing state
Option 3: Introduce 2 new events, one that indicates a relist is starting and one that indicates the relist is completed, this can then remove the need for the Restarted event because these 2 new events give enough signal.
Option 4: stop being helpful and just expose the same events as the plain k8s API

Personally I'm in favor of option 3. Option 3 will allow for keeping client-side memory usage low, and it would allow for flagging a store as in or out of sync.
This PR currently holds option 1

This is a draft, so it's not signed. I'll work through what is required to submit PR's from my work email for the final PR. I just wanted to gather feedback.

closes: #1254

@clux
Copy link
Member

clux commented Jul 17, 2023

Hey, thanks a lot for this, and while this is going further than the original help-wanted issue (as the runtime is a little less clear), I appreciate the taking the initiative in an open-minded way with several options. I'll leave some comments on the PR, but some high level comments first:

Personally, I am leaning towards option 3 also for the watcher side of things just as that's one of the few things that can actually help with the client spikiness. An option 3 would necessarily involve buffering at the reflector side, but that would actually help people both using reflectors and people just watching:

  • watcher users get everything as it comes in
  • reflector users can prune before caching (using WatchStreamExt::modify) because the raw events are passed through from watcher before it is buffered - so it is still a spike, but it's a more controllable spike

Note also that in general the aim is not to abstract away the k8s api. Particularly, kube-core and kube-client should just do the k8s api (where copy-pasted doc-strings are often a thing for the same parameters). So i don't think 3 and 4 are necessarily in conflict.

Copy link
Member

@clux clux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick points

e2e/Cargo.toml Outdated
latest = ["k8s-openapi/v1_26"]
latest = ["k8s-openapi/v1_27"]
Copy link
Member

@clux clux Jul 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is a just bump-k8s target in the justfile which will catch everything (and bump the MK8SV in the process) if you manually updated it might be worth checking that to ensure you catch the stragglers (such as .github/workflow pins and README pins)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI You can do "k8s-openapi/latest" now.

e2e/Cargo.toml Outdated Show resolved Hide resolved
kube-core/src/params.rs Outdated Show resolved Hide resolved
kube-core/src/params.rs Outdated Show resolved Hide resolved
kube-core/src/params.rs Outdated Show resolved Hide resolved
kube-core/src/params.rs Outdated Show resolved Hide resolved
kube-core/src/watch.rs Outdated Show resolved Hide resolved
kube-runtime/src/watcher.rs Outdated Show resolved Hide resolved
kube-runtime/src/watcher.rs Outdated Show resolved Hide resolved
@casualjim
Copy link
Contributor Author

Thanks for the feedback, I'll apply your suggestions manually

@casualjim
Copy link
Contributor Author

I was thinking that because it's an API change, we could create a watcher2 or streaming_watcher, that has the event contract from option 3. That way we can mark the previous implementation as deprecated, and carry it forward a few versions before removing it. That would give people who depend on this library time to move to the new implementation instead of it being forced on them.

@clux
Copy link
Member

clux commented Jul 18, 2023

I was thinking that because it's an API change, we could create a watcher2 or streaming_watcher, that has the event contract from option 3. That way we can mark the previous implementation as deprecated, and carry it forward a few versions before removing it. That would give people who depend on this library time to move to the new implementation instead of it being forced on them.

Initially, I feel it's probably better to keep it as a single entrypoint. The breakage is mostly to a lower level interface (that is mostly relevant to custom store implementors - for whom handling the new event should not be too hard) and not the watcher itself, and it's nice to just be able to flick a switch in the config that's available immediately rather than find a new import.

There is potentially another change in the watcher that might make its way in (arc wrapping objects - a remainder from #1080), maybe if both of those need to land we can copy out an old version and do a full deprecation cycle. I'll see where we are there, but in the mean time, don't worry about it in this PR, that's more a release problem.

@nightkr
Copy link
Member

nightkr commented Jul 20, 2023

I think option 3 makes sense as a longer-term goal, but I'd rather get this PR in using option 1 and then focus on moving the buffering into reflector in a later PR. Removing the buffering from the network layer is still an improvement in its own right.

@casualjim
Copy link
Contributor Author

I've addressed the comments

@codecov
Copy link

codecov bot commented Jul 22, 2023

Codecov Report

Merging #1255 (85eda87) into main (d410464) will decrease coverage by 0.51%.
The diff coverage is 30.86%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1255      +/-   ##
==========================================
- Coverage   72.66%   72.15%   -0.51%     
==========================================
  Files          75       75              
  Lines        6186     6249      +63     
==========================================
+ Hits         4495     4509      +14     
- Misses       1691     1740      +49     
Files Changed Coverage Δ
kube-client/src/client/tls.rs 79.16% <ø> (ø)
kube-core/src/watch.rs 57.14% <ø> (ø)
kube-runtime/src/watcher.rs 38.13% <18.46%> (-10.08%) ⬇️
kube-core/src/params.rs 81.13% <70.00%> (-0.56%) ⬇️
kube-core/src/request.rs 98.73% <100.00%> (+0.01%) ⬆️

Copy link
Member

@clux clux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General setup looks good! Some issues with resource versions, missing version gates, and partially bumped k8s versions. It's also currently not working on the 1.27 cluster i tested this on:

2023-07-22T21:10:53.985375Z DEBUG kube_runtime::utils::stream_backoff: Error received, backing off deadline=Instant { tv_sec: 37803, tv_nsec: 770027883 } duration=85.857327ms
Error: watch stream failed: ApiError: ListOptions.meta.k8s.io "" is invalid: resourceVersionMatch: Forbidden: resourceVersionMatch is forbidden for watch unless sendInitialEvents is provided: Invalid (ErrorResponse { status: "Failure", message: "ListOptions.meta.k8s.io \"\" is invalid: resourceVersionMatch: Forbidden: resourceVersionMatch is forbidden for watch unless sendInitialEvents is provided", reason: "Invalid", code: 422 })

Caused by:
    0: ApiError: ListOptions.meta.k8s.io "" is invalid: resourceVersionMatch: Forbidden: resourceVersionMatch is forbidden for watch unless sendInitialEvents is provided: Invalid (ErrorResponse { status: "Failure", message: "ListOptions.meta.k8s.io \"\" is invalid: resourceVersionMatch: Forbidden: resourceVersionMatch is forbidden for watch unless sendInitialEvents is provided", reason: "Invalid", code: 422 })
    1: ListOptions.meta.k8s.io "" is invalid: resourceVersionMatch: Forbidden: resourceVersionMatch is forbidden for watch unless sendInitialEvents is provided: Invalid

when using

diff --git a/examples/pod_watcher.rs b/examples/pod_watcher.rs
index e4e79fc4..a485892f 100644
--- a/examples/pod_watcher.rs
+++ b/examples/pod_watcher.rs
@@ -13,7 +13,7 @@ async fn main() -> anyhow::Result<()> {
     let client = Client::try_default().await?;
     let api = Api::<Pod>::default_namespaced(client);
 
-    watcher(api, watcher::Config::default())
+    watcher(api, watcher::Config::default().streaming_lists().any_semantic())
         .applied_objects()
         .default_backoff()
         .try_for_each(|p| async move {

kube-core/src/params.rs Outdated Show resolved Hide resolved
kube-core/src/params.rs Outdated Show resolved Hide resolved
kube-core/src/params.rs Outdated Show resolved Hide resolved
kube-runtime/src/watcher.rs Outdated Show resolved Hide resolved
@clux clux added the changelog-add changelog added category for prs label Jul 22, 2023
@clux clux added this to the 0.85.0 milestone Jul 22, 2023
@casualjim
Copy link
Contributor Author

I'll add a namespace example reflector that shows the watch list behavior. Do I feature gate all the doc comments I made?

@casualjim
Copy link
Contributor Author

I've added an example now and updated the k3d configuration that enables the feature gate

@casualjim casualjim force-pushed the add-watch-list branch 2 times, most recently from e61da70 to 145a1cd Compare July 23, 2023 05:17
@casualjim casualjim marked this pull request as ready for review July 23, 2023 05:17
Copy link
Member

@clux clux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does actually work for me now! Thanks for the k3d arg.

Left a few more suggestions on serialization/builders/configuration docs. This is veering a bit more into the whole philosophical discussion of "making invalid states unrepresentable" (which is something we want to do for the abstractions, but we not necessarily something we sensible can completely disallow at the serialization layer because we don't necessarily know what comes next - or if Kubernetes' docs are correct/will stay correct).

Have in general suggested moving our own safety checks and preconfigured "good defaults" into validators + and added some suggestions for watcher::Config to make the correct choice more clear, and less implicit.

(There may be some follow-up issues that are worth raising here on our end as well; maybe we should not allow disabling bookmarks in watcher, and maybe we shouldn't be hard-enforcing a timeout of 295s in validate.)

kube-core/src/params.rs Show resolved Hide resolved
kube-core/src/params.rs Show resolved Hide resolved
kube-runtime/src/watcher.rs Outdated Show resolved Hide resolved
@@ -347,6 +384,7 @@ impl Default for WatchParams {
label_selector: None,
field_selector: None,
timeout: None,
send_initial_events: false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a sensible default for now. I think we should maybe add a new constructor that sets correct flags rather than mess with the builders/serialization. Something like:

impl WatchParams
    /// Constructor for doing Kubernetes 1.27 Streaming List watches
    ///
    /// Enables [`VersionMatch::NotGreaterThan`] semantics and [`WatchParams::send_initial_events`].
    pub fn streaming_lists() -> Self {
        Self {
            send_initial_events: true,
            version_match: Some(VersionMatch::NotGreaterThan), // required
            bookmarks: true, // required
            ..WatchParams::default()
        }
    }

kube-runtime/src/watcher.rs Outdated Show resolved Hide resolved
kube-runtime/src/watcher.rs Outdated Show resolved Hide resolved
kube-core/src/params.rs Outdated Show resolved Hide resolved
@clux
Copy link
Member

clux commented Aug 4, 2023

@casualjim are you still working on this? I see you've gone down the rabbit hole of fiddling with resolver things and making CI throw a fit. There are a few outstanding pr comments to be resolved, so please let me know your intentions.

Your PR is ultimately very good and it's heavily appreciated (even though we are stuck here on minutia and CI stuff atm). There's nothing really blocking from the k8s-openapi side to release their new minor with v1_27 (as we have both tested that it works), so in the interest in not holding up the rest of the ecosystem we can either try to expedite this PR or release a new kube whenever they push their minor.

@Arnavion
Copy link

Arnavion commented Aug 4, 2023

There's nothing really blocking from the k8s-openapi side to release their new minor with v1_27 (as we have both tested that it works)

Cool. I'll release it this weekend then.

@casualjim
Copy link
Contributor Author

I would like to get past this hurdle of CI yes. Unfortunately I have to do planning and roadmap work for my project (this library is part of said project) and will need to spend a lot of time in google docs instead of in code this weekend. The failures are about building k8s-openapi so I'm going to wait for a release on that.

I think the tests etc passed when I ran them locally so hopefully this will build here too. If not I can take a look again next weekend perhaps.

@casualjim
Copy link
Contributor Author

I don't know what to do with CI.
I can run the just tasks just fine on my computer but as soon as I try to do it with act the deny starts failing and something that's supposed to set up k3d keeps telling me that kubectl can't be found.

So it's difficult to get this to work for me atm. This could use some help from somebody who knows the entire toolchain

Cargo.toml Outdated Show resolved Hide resolved
@clux
Copy link
Member

clux commented Aug 15, 2023

I don't know what to do with CI. I can run the just tasks just fine on my computer but as soon as I try to do it with act the deny starts failing and something that's supposed to set up k3d keeps telling me that kubectl can't be found.

I think a lot of the CI failures atm is due to resolver (because it's getting the multiple k8s-openapi versions enabled message). Don't worry about the deny failure, will fix that in main (related to us not maintaining a lockfile).

@casualjim
Copy link
Contributor Author

I undid the resolver = 2 thing, hopefully that makes it go further.

When I run locally most things seem to succeed but there is one thing that can not find kubectl and it will just hang on that

Signed-off-by: Ivan Porto Carrero <[email protected]>
@clux
Copy link
Member

clux commented Aug 19, 2023

Had another look over of this and did another test, and this seems fine to me. Resolved a few comments as per strategy in https://github.com/kube-rs/kube/pull/1281/files (which is nice to be able to justify as it does make invalid states unrepresentable).

One doc nit remains (config member doc suggestion), and the other two threads are just notes for the future. I am otherwise happy with this. Easy to opt-in, hard to do the wrong thing, works well, CI now green. Clean addition. Thanks a lot for doing this!

@casualjim
Copy link
Contributor Author

I updated the doc and added that extra constructor method

@clux
Copy link
Member

clux commented Aug 20, 2023

Thanks, looks good! But needs the DCO signoff again now.

@clux clux merged commit 3af6b7f into kube-rs:main Aug 22, 2023
15 of 16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog-add changelog added category for prs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add streaming list support to WatchParams
4 participants