You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In some of our metrics, we include the HTTP route as a tag. If users include data with high cardinality in their HTTP requests, such as a per-user ID, this blows up the prometheus metrics (and can render the metrics unusable).
We need to reduce the cardinality here, perhaps by only exporting the Serve-level route_prefix instead of the full route.
The text was updated successfully, but these errors were encountered:
edoakes
added
bug
Something that is supposed to be working; but isn't
P1
Issue that should be fixed within a few weeks
serve
Ray Serve Related Issue
labels
Oct 11, 2024
Ok, I did some prototyping here. We have metrics containing the route in two places: the proxy and the replica.
In the proxy, we don't have access to application-defined routes (by design) so we can't do anything too clever. We could try to do something like auto-detect the cardinality and cap the number of tags, but that seems excessively complex.
In the replica, we do have access to the underlying ASGI app which we can use to identify the matched route string (e.g., /path/{wildcard}.
So I'd propose that we:
Update proxy metrics to export the matched route_prefix under the existing route tag.
Update replica metrics to export the matched route handler string for FastAPI applications, else the route_prefix for applications that use the raw ASGI request.
We could consider changing the metric tag for the proxy metrics to route_prefix for clarity, but that introduces a migration for what seems to me like a very minor improvement.
In some of our metrics, we include the HTTP route as a tag. If users include data with high cardinality in their HTTP requests, such as a per-user ID, this blows up the prometheus metrics (and can render the metrics unusable).
We need to reduce the cardinality here, perhaps by only exporting the Serve-level
route_prefix
instead of the full route.The text was updated successfully, but these errors were encountered: