Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(llmobs): propagate distributed headers via signal dispatching [backport #12089] #12134

Merged
merged 2 commits into from
Jan 30, 2025

Conversation

Yun-Kim
Copy link
Contributor

@Yun-Kim Yun-Kim commented Jan 28, 2025

Backports #12089 to 2.20.

Note that we had to manually cherry-pick from #12089 to account for #11952 not being backported.

This PR makes a change to our shared distributed tracing header injection method to dispatch signals/events instead of relying on the global config settings, which is only modifiable via env vars. This fixes distributed tracing for users that might rely solely on the LLMObs.enable() setup config.

Programmatic LLMObs.enable()/disable() calls do not set the global config._llmobs_enabled boolean setting, which is only controlled by the DD_LLMOBS_ENABLED env var. This was problematic for users that relied on manual LLMObs.enable() setup (i.e. no env vars) because our distributed tracing injection code only checks the global config to inject llmobs parent IDs into request headers. If users manually enabled LLMObs without any env vars, then this would not be reflected in the global config value and thus LLMObs parent IDs would never be injected into the request headers.

We can't check directly if LLMObs is enabled in the http injection module because:

  1. This would require us to import significant product-specific LLMObs-code into the shared http injector helper module which would impact non-LLMObs users' app performance
  2. Circular imports in LLMObs which imports http injector logic to use in its own helpers

Instead of doing our check based on the global config._llmobs_enabled setting, we now send a tracing event to our shared product listeners, and register a corresponding LLMObs._inject_llmobs_context() hook to be called for all inject() calls if LLMObs is enabled (we check the LLMObs instance, not the global config setting value).

One risk and why I don't like changing global config settings is because this then implies that it is no longer global or tied to an env var (I want to push for env var configuration where possible over manual overriding/enabling). If a global enabled config can be toggled indiscriminately then this could open a can of worms for enabling/disabling logic in our LLMObs service, which isn't really designed to be toggled on/off multiple times in the app's lifespan. However if some users cannot rely on env vars, then I don't see any other solution that does not couple tracer internal code with LLMObs code which is a no-option. (UPDATE: we avoided this issue by using signal dispatching)

Checklist

  • PR author has checked that all the criteria below are met
  • The PR description includes an overview of the change
  • The PR description articulates the motivation for the change
  • The change includes tests OR the PR description describes a testing strategy
  • The PR description notes risks associated with the change, if any
  • Newly-added code is easy to change
  • The change follows the library release note guidelines
  • The change includes or references documentation updates if necessary
  • Backport labels are set (if applicable)

Reviewer Checklist

  • Reviewer has checked that all the criteria below are met
  • Title is accurate
  • All changes are related to the pull request's stated goal
  • Avoids breaking API changes
  • Testing strategy adequately addresses listed risks
  • Newly-added code is easy to change
  • Release note makes sense to a user of the library
  • If necessary, author has acknowledged and discussed the performance implications of this PR as reported in the benchmarks PR comment
  • Backport labels are set in a manner that is consistent with the release branch maintenance
    policy

…t config (#12089)

This PR makes a change to our shared distributed tracing header
injection method to dispatch signals/events instead of relying on the
global config settings, which is only modifiable via env vars. This
fixes distributed tracing for users that might rely solely on the
`LLMObs.enable()` setup config.

Programmatic `LLMObs.enable()/disable()` calls do not set the global
`config._llmobs_enabled` boolean setting, which is only controlled by
the `DD_LLMOBS_ENABLED` env var. This was problematic for users that
relied on manual `LLMObs.enable()` setup (i.e. no env vars) because our
distributed tracing injection code only checks the global config to
inject llmobs parent IDs into request headers. If users manually enabled
LLMObs without any env vars, then this would not be reflected in the
global config value and thus LLMObs parent IDs would never be injected
into the request headers.

We can't check directly if LLMObs is enabled in the http injection
module because:
1. This would require us to import significant product-specific
LLMObs-code into the shared http injector helper module which would
impact non-LLMObs users' app performance
2. Circular imports in LLMObs which imports http injector logic to use
in its own helpers

Instead of doing our check based on the global `config._llmobs_enabled`
setting, we now send a tracing event to our shared product listeners,
and register a corresponding `LLMObs._inject_llmobs_context()` hook to
be called for all inject() calls if LLMObs is enabled (we check the
LLMObs instance, not the global config setting value).

~One risk and why I don't like changing global config settings is
because this then implies that it is no longer global or tied to an env
var (I want to push for env var configuration where possible over manual
overriding/enabling). If a global enabled config can be toggled
indiscriminately then this could open a can of worms for
enabling/disabling logic in our LLMObs service, which isn't really
designed to be toggled on/off multiple times in the app's lifespan.
However if some users cannot rely on env vars, then I don't see any
other solution that does not couple tracer internal code with LLMObs
code which is a no-option.~ (UPDATE: we avoided this issue by using
signal dispatching)

- [x] PR author has checked that all the criteria below are met
- The PR description includes an overview of the change
- The PR description articulates the motivation for the change
- The change includes tests OR the PR description describes a testing
strategy
- The PR description notes risks associated with the change, if any
- Newly-added code is easy to change
- The change follows the [library release note
guidelines](https://ddtrace.readthedocs.io/en/stable/releasenotes.html)
- The change includes or references documentation updates if necessary
- Backport labels are set (if
[applicable](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting))

- [x] Reviewer has checked that all the criteria below are met
- Title is accurate
- All changes are related to the pull request's stated goal
- Avoids breaking
[API](https://ddtrace.readthedocs.io/en/stable/versioning.html#interfaces)
changes
- Testing strategy adequately addresses listed risks
- Newly-added code is easy to change
- Release note makes sense to a user of the library
- If necessary, author has acknowledged and discussed the performance
implications of this PR as reported in the benchmarks PR comment
- Backport labels are set in a manner that is consistent with the
[release branch maintenance
policy](https://ddtrace.readthedocs.io/en/latest/contributing.html#backporting)
Copy link
Contributor

CODEOWNERS have been resolved as:

releasenotes/notes/fix-llmobs-enable-updates-config-45379a7a30e2e0e3.yaml  @DataDog/apm-python
ddtrace/llmobs/_llmobs.py                                               @DataDog/ml-observability
ddtrace/propagation/http.py                                             @DataDog/apm-sdk-api-python
tests/llmobs/test_llmobs_service.py                                     @DataDog/ml-observability
tests/llmobs/test_propagation.py                                        @DataDog/ml-observability
tests/tracer/test_propagation.py                                        @DataDog/apm-sdk-api-python

@Yun-Kim Yun-Kim marked this pull request as ready for review January 28, 2025 22:20
@Yun-Kim Yun-Kim requested review from a team as code owners January 28, 2025 22:20
@Yun-Kim Yun-Kim changed the title fix(llmobs): propagate distributed headers via signal dispatching, not config [backport #12089 to 2.20] fix(llmobs): propagate distributed headers via signal dispatching [backport #12089 to 2.20] Jan 28, 2025
@Yun-Kim Yun-Kim changed the title fix(llmobs): propagate distributed headers via signal dispatching [backport #12089 to 2.20] fix(llmobs): propagate distributed headers via signal dispatching [backport #12089] Jan 28, 2025
@datadog-dd-trace-py-rkomorn
Copy link

datadog-dd-trace-py-rkomorn bot commented Jan 28, 2025

Datadog Report

Branch report: yunkim/backport-12089-to-220
Commit report: 83eda33
Test service: dd-trace-py

✅ 0 Failed, 130 Passed, 1378 Skipped, 4m 2.35s Total duration (35m 11.99s time saved)

@pr-commenter
Copy link

pr-commenter bot commented Jan 28, 2025

Benchmarks

Benchmark execution time: 2025-01-30 02:30:14

Comparing candidate commit 83eda33 in PR branch yunkim/backport-12089-to-220 with baseline commit f6d814c in branch 2.20.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 394 metrics, 2 unstable metrics.

@Yun-Kim Yun-Kim enabled auto-merge (squash) January 29, 2025 18:38
@Yun-Kim Yun-Kim merged commit bca45d4 into 2.20 Jan 30, 2025
585 of 586 checks passed
@Yun-Kim Yun-Kim deleted the yunkim/backport-12089-to-220 branch January 30, 2025 02:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants