Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Pin redis-py to 4.5.5 and enhance logging for Redis reconnection and failover #1620

Merged
merged 13 commits into from
Oct 16, 2023

Conversation

achimnol
Copy link
Member

@achimnol achimnol commented Oct 15, 2023

refs #1443

This PR includes several critical fixes:

  • Upgrade redis-py (4.6.0 → 5.0.1) to get the latest bug fixes related with garbage collection of socket connections and timeout detection on pipelined commands
    • redis-py 5.0.0 and 5.0.1 have numerous issues related with connection pool handling, especially with the async version of BlockingConnectionPool. We need to wait until redis-py stabilizes........
    • This PR pins the redis-py version 4.5.5 before it introduces the connection leak due to removal of Connection.__del__().
    • Explicitly pass auto_close_connection_pool=True (NOTE: Missing of this parameter is fixed in redis-py 5.0.0 and deprecated in 5.0.1, but the whole upstream package is broken now.)
    • Set max_connections to 16 per connection pool instead of the default 2**31.... (!!)
  • Enhance logging for retries of Redis connection with explicit names for all connection pool instances
  • Fix a bug in manager.registry.handle_kernel_log() to avoid creating a new Redis connection pool upon every container destruction request

Related upstream issues

Checklist: (if applicable)

  • Milestone metadata specifying the target backport version

@achimnol achimnol added this to the 23.03 milestone Oct 15, 2023
@achimnol achimnol added type:bug Reports about that are not working type:enhance Enhance component, behavior, internals without user-facing features impact:invisible This change is invisible to users (internal changes). labels Oct 15, 2023
@achimnol achimnol self-assigned this Oct 15, 2023
@github-actions github-actions bot added comp:manager Related to Manager component comp:webserver Related to Web Server component size:L 100~500 LoC labels Oct 15, 2023
@achimnol achimnol added the urgency:5 It is imperative that action be taken right away. label Oct 15, 2023
@github-actions github-actions bot added size:XL 500~ LoC and removed size:L 100~500 LoC labels Oct 15, 2023
- In this case, we can distinguish whether the conection failure is
  caused by either the full connection pool or actual network/server failures.
@achimnol
Copy link
Member Author

Made an upstream report: redis/redis-py#3008

@achimnol achimnol changed the title fix: Enhance logging for Redis reconnection and failover fix: Pin redis-py to 4.5.5 and enhance logging for Redis reconnection and failover Oct 16, 2023
@achimnol achimnol added this pull request to the merge queue Oct 16, 2023
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Oct 16, 2023
@achimnol achimnol added this pull request to the merge queue Oct 16, 2023
Merged via the queue into main with commit 1419cf3 Oct 16, 2023
@achimnol achimnol deleted the fix/enhance-redis-retry-log branch October 16, 2023 12:26
achimnol added a commit that referenced this pull request Oct 16, 2023
… and failover (#1620)

Backported-from: main
Backported-to: 23.09
achimnol added a commit that referenced this pull request Oct 16, 2023
… and failover (#1620)

This commit only includes redis-py 4.5.5 pinning, connection pool
mis-creation in `AgentRegistry.handle_kernel_log()`, and limitation of
`max_connections` for the default connection pool.

Added an explicit warning if the native sentinel client is used, because the rework
on the sentinel connection pool (#1586) targets 23.09 only.

Backported-from: main
Backported-to: 23.03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
comp:manager Related to Manager component comp:webserver Related to Web Server component impact:invisible This change is invisible to users (internal changes). size:XL 500~ LoC type:bug Reports about that are not working type:enhance Enhance component, behavior, internals without user-facing features urgency:5 It is imperative that action be taken right away.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant