Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Support for deleting keys in memberlist KV store. #605

Draft
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

pstibrany
Copy link
Member

What this PR does:

This PR implements deletion of keys in memberlist KV Store. Untested.

Which issue(s) this PR fixes:

Fixes #

Checklist

  • Tests updated
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

pstibrany and others added 17 commits October 9, 2024 15:38
Add an `.Advance()` method to MockCache and InstrumentedMockCache to
allow the time considered "now" to be moved without needing to actually
sleep. This is useful for testing when items are set with a TTL and you
would like for them to actually expire as they would in a real cache.

Part of grafana/mimir#9386

Signed-off-by: Nick Pillitteri <[email protected]>
* Really fix flaky `TestMultipleClients` tests
#599 went in a wrong direction, so I revert it here: it makes all `TestMultipleClients` tests flaky since not all members are necessarily joined after the updates

Instead, to fix the `TestMultipleClientsWithMixedLabelsAndExpectFailure` test, I add a couple more labeled members, that way, even if we reach 3 updates (which happened some times), we'll never get to 5 with a single member

Also, try to fix the `TestTLSServerWithLocalhostCertWithClientCertificateEnforcementUsingClientCA1` test by closing GRPC connections. Maybe this (golang/go#36700) is the issue?

* Set `MaxIdleConnsPerHost`

* New attempt to fix this test. Could it be aborted connections?

* Retry RSTs
* make WriteTo non-blocking

* Try to make this PR ready to go
- Create goroutines and keep them while the TCPTransport is alive. End them on the `Shutdown` function
- Add `TestTCPTransportWriteToUnreachableAddr` test to check that writing is not blocking anymore (without this PR, it takes `writeCt * timeout` to run and it fails)

* Add CHANGELOG

* Address PR comments
- Rename CHANGELOG
- Mutex lock on shutdown rather than write
- Wait when workers are ended rather than for each write

* Address PR comments
- Move variables around
- Add timeout before dropping requests. This prevents blocking on the `WriteTo` function

---------

Co-authored-by: Julien Duchesne <[email protected]>
…#608)

Closes #389

These tests are highly dependant on timing, which makes them flaky. Poll for a bit longer at the end to make sure that the memberlist was updated
* Adds a layer of buffering to Memberlist's notification handling so that notifications are fired at most once per NotifyInterval, at which point it will deliver notifications using the most recently-observed data.
* Adds a new config flag to control this interval: -memberlist.notify-interval which defaults to 0 (off).
Motivation for this change:

In clusters where the memberlist KVStore watched by Ring has many replicas, redeploying those replicas can cause WatchKey and updateRingState to be called hundreds of times per second. When there are many concurrent goroutines calling ring.ShuffleShard, the high rate of updateRingState calls (which take locks and clear caches) can create heavy lock contention and latency as ShuffleShard attempts to take locks in order to repopulate those caches.
* Extend existing tests to catch issue

* Add further test cases

* Fix issue where caller information is logged twice

* Add benchmark comparing the performance of `Caller` and `SpanLoggerAwareCaller`

* Change name to satisfy linter

* Fix tests broken by removal of `caller` key/value pair from `SpanLogger`

* Don't bother with checking for go-kit/log stack frames until we've seen SpanLogger

This makes the cost of spanlogger.Caller negligible for use outside of
SpanLogger:

                            │ SpanLoggerAwareCaller/with_go-kit's_Caller-10 │ SpanLoggerAwareCaller/with_dskit's_spanlogger.Caller-10 │
                            │                    sec/op                     │                 sec/op                  vs base         │
comparison-after-step-2.txt                                     927.5n ± 5%                              932.1n ± 2%  ~ (p=0.093 n=6)

                            │ SpanLoggerAwareCaller/with_go-kit's_Caller-10 │ SpanLoggerAwareCaller/with_dskit's_spanlogger.Caller-10 │
                            │                     B/op                      │                  B/op                   vs base         │
comparison-after-step-2.txt                                      732.5 ± 0%                               733.0 ± 0%  ~ (p=0.636 n=6)

                            │ SpanLoggerAwareCaller/with_go-kit's_Caller-10 │ SpanLoggerAwareCaller/with_dskit's_spanlogger.Caller-10 │
                            │                   allocs/op                   │              allocs/op                vs base           │
comparison-after-step-2.txt                                      9.000 ± 0%                             9.000 ± 0%  ~ (p=1.000 n=6) ¹
¹ all samples are equal

* Reuse slice and use string equality check

* Address PR feedback: use const in test

Co-authored-by: Arve Knudsen <[email protected]>

---------

Co-authored-by: Arve Knudsen <[email protected]>
I find it unsettling that we need to acquire a mutex (a concurrency
primitive) to send data to a channel (another concurrency primitive).

I think this achieves the same effect (we can stop the worker
goroutines in a safe way) without having to mix them and leaving the
main code path clear (just try to send a job, or run a new goroutine).

Signed-off-by: Oleg Zaytsev <[email protected]>
- Do not increment the `packets_sent_errors_total`, instead have a new `packets_dropped_total` so we can isolate these
- Debug messages instead of warn. These logs are all mostly the same, they don't need to be "warn" level
Closes #381

The MacOS network dialog is a bit annoying
…s-in-memberlist-KV-store' into Initial-support-for-deleting-keys-in-memberlist-KV-store
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants