Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce request ID and request context passed through webserver, manager, storage-proxy, and agent #1678

Open
achimnol opened this issue Oct 31, 2023 · 0 comments
Assignees
Milestone

Comments

@achimnol
Copy link
Member

achimnol commented Oct 31, 2023

To make postmortem analysis of unexpected failures easier in customer sites, let's introduce request IDs attached for every log message if present.

This will allow easier search over various daemon logs when we visit custom sites. We could tell the end-users to report the request IDs they see.

  • The webserver/storage-proxy's client-facing API/app-proxy generate a new request ID for each frontend's API call.* If the request has failed (5xx), show an error message like "The server got a trouble. Please report to the administrator or support staff. (request ID: XXXXXXX)".

  • They this request context (including the URL, source IP, etc.) when calling the manager API.

  • The manager passes this request context when calling the agent RPC function / storage-proxy's manager-facing APIs.

  • All webserver/manager/agent/storage-proxy daemons should include the request context information in every log message whenever available.* Inside each daemon, we could use contextvars to keep track of the request context. common.logging should be updated to be aware of this.

  • For internally invoked tasks such as timers, we could generate a different type of request ID for the same purpose.* Note that scheduler should keep the request context used in enqueue_session() and destroy_session() so that any actions triggered by specific session creation/destruction requests should leave the log with the same request ID. Probably we could attach both the user-request ID and system-request ID (e.g., specific invocation of the dispatcher) for the same log.

@achimnol achimnol added type:feature Add new features platform:enterprise Backend.AI Enterprise support. comp:manager Related to Manager component comp:agent Related to Agent component comp:common Related to Common component comp:webserver Related to Web Server component comp:storage-proxy Related to Storage proxy component effort:normal Need to understand a few modules / some extent of contextual or historical information. urgency:3 Must be finished within a certain time frame. labels Oct 31, 2023
@achimnol achimnol added this to the 24.03 milestone Oct 31, 2023
@achimnol achimnol changed the title Introduce request ID and request context passed through webserver, manager, and agent Introduce request ID and request context passed through webserver, manager, storage-proxy, and agent Oct 31, 2023
@achimnol achimnol removed the type:feature Add new features label Oct 18, 2024
@achimnol achimnol removed platform:enterprise Backend.AI Enterprise support. comp:manager Related to Manager component comp:agent Related to Agent component comp:common Related to Common component comp:webserver Related to Web Server component comp:storage-proxy Related to Storage proxy component effort:normal Need to understand a few modules / some extent of contextual or historical information. urgency:3 Must be finished within a certain time frame. labels Jan 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants