Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix delay in creating redis connections #35

Merged
merged 1 commit into from
Nov 1, 2024

Conversation

smoogipoo
Copy link
Contributor

So... I'm going to preface by saying that I've gone deep but haven't gone through the full length necessary to determine what/when/how things went wrong here.

When running osu-queue-score-statistics tests, I found them getting slower over time and the cause to be Redis. You can attach a log to the Redis configuration options to get a bit more info, here's an excerpt:

11:04:19.8880: Connecting (sync) on .NET 8.0.8 (StackExchange.Redis: v2.7.33.41805)
11:04:19.8880: localhost
11:04:19.8880: localhost:6379/Interactive: Connecting...
11:04:19.8881: localhost:6379: BeginConnectAsync
11:04:19.8881: 1 unique nodes specified (with tiebreaker)
11:04:19.8881: localhost:6379: OnConnectedAsync init (State=Connecting)
11:04:19.8882: Allowing 1 endpoint(s) 00:00:05 to respond...
11:04:19.8882: Awaiting 1 available task completion(s) for 5000ms, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=4,Free=32763,Min=8,Max=32767), POOL: (Threads=8,QueuedItems=0,CompletedItems=206512,Timers=19)
11:04:19.8978: localhost:6379/Interactive: Connected 
11:04:19.8978: localhost:6379: Server handshake
11:04:20.9696: localhost:6379: Setting client name: mbp(SE.Redis-v2.7.33.41805)
11:04:20.9697: localhost:6379: Setting client lib/ver
11:04:20.9697: localhost:6379: Auto-configuring...
11:04:20.9697: localhost:6379: Requesting tie-break (Key="__Booksleeve_TieBreak")...
11:04:20.9697: localhost:6379/Interactive: Writing: GET __Booksleeve_TieBreak
11:04:20.9697: localhost:6379: Sending critical tracer (handshake): ECHO
11:04:20.9697: localhost:6379/Interactive: Writing: ECHO
11:04:20.9697: localhost:6379: Flushing outbound buffer
11:04:20.9697: localhost:6379: OnEstablishingAsync complete
11:04:20.9697: localhost:6379: Starting read
11:04:20.9706: localhost:6379: Auto-configured (CLIENT) connection-id: 9928
11:04:20.9707: localhost:6379: Auto-configured (CONFIG) read-only replica: true
11:04:20.9707: localhost:6379: Auto-configured (CONFIG) databases: 16
11:04:20.9707: localhost:6379: Auto-configured (INFO) role: primary
11:04:20.9707: localhost:6379: Auto-configured (INFO) version: 7.4.1
11:04:20.9707: localhost:6379: Auto-configured (INFO) server-type: Standalone
11:04:20.9707: Response from localhost:6379/Interactive / GET __Booksleeve_TieBreak: (null)
11:04:20.9707: Response from localhost:6379/Interactive / ECHO: BulkString: 16 bytes
11:04:20.9707: localhost:6379: OnConnectedAsync completed (From command: ECHO)
11:04:20.9707: All 1 available tasks completed cleanly, IOCP: (Busy=0,Free=1000,Min=1,Max=1000), WORKER: (Busy=4,Free=32763,Min=8,Max=32767), POOL: (Threads=9,QueuedItems=0,CompletedItems=206599,Timers=5)
11:04:20.9707: Endpoint summary:
11:04:20.9707:   localhost:6379: Endpoint is (Interactive: ConnectedEstablished, Subscription: ConnectedEstablished)
11:04:20.9707: Task summary:
11:04:20.9707:   localhost:6379: Returned with success as Standalone primary (Source: From command: ECHO)
11:04:20.9707: Election summary:
11:04:20.9707:   Election: localhost:6379 had no tiebreaker set
11:04:20.9707:   Election: Single primary detected: localhost:6379
11:04:20.9707: localhost:6379: Clearing as RedundantPrimary
11:04:20.9707: Endpoint Summary:
11:04:20.9707:   localhost:6379: Standalone v7.4.1, primary; 16 databases; keep-alive: 00:01:00; int: ConnectedEstablished; sub: ConnectedEstablished, 1 active
11:04:20.9707:   localhost:6379: int ops=13, qu=0, qs=1, qc=0, wr=0, socks=1; sub ops=6, qu=0, qs=0, qc=0, wr=0, subs=1, socks=1
11:04:20.9707:   localhost:6379: Circular op-count snapshot; int: 0+13=13 (1.30 ops/s; spans 10s); sub: 0+6=6 (0.60 ops/s; spans 10s)
11:04:20.9707: Sync timeouts: 0; async timeouts: 0; fire and forget: 0; last heartbeat: -1s ago
11:04:20.9707: Starting heartbeat...
11:04:20.9708: Total connect time: 1,082 ms

Note that the entire 1s connection time was taken up between the two lines:

11:04:19.8978: localhost:6379: Server handshake
11:04:20.9696: localhost:6379: Setting client name: mbp(SE.Redis-v2.7.33.41805)

What's going on here? https://github.com/StackExchange/StackExchange.Redis/blob/c8a7265d475c24834ed3497140cdf98bbfd50975/src/StackExchange.Redis/ServerEndPoint.cs#L924-L1009

It looks like not much at all! We get a few variables from the options, call SetProtocol which is really just a property setter, and then print the client name.

An important note to understand here is that ConnectionMultiplexer offloads the connection to a separate thread and then effectively calls .Wait() waiting for that thread. That is to say, the call/profile stack isn't immediately obvious. Until you look at that background thread and find:

Screenshot 2024-11-01 at 20 59 29

This background thread is wasting all of its time in some TryGetAzureRoleInstanceIdNoThrow method which just returns null.

This is where my investigation stops. For some reason Type.GetType() is getting slower over time - this could be because of how the XUnit test runner works internally, perhaps it's creating assemblies/types and the list is growing over time to a point it eventually becomes unmanageable.

I've gone with the most simple solution here which is to cache the config options, with the hope that this is self-explanatory/acceptable enough on its own to not have to document the insanity involved.

Copy link
Member

@peppy peppy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙈😭

@peppy peppy merged commit e26c865 into ppy:master Nov 1, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants