Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracing: W3C Trace Context Support #41

Open
4 of 15 tasks
danielkhan opened this issue Nov 20, 2023 · 9 comments
Open
4 of 15 tasks

Tracing: W3C Trace Context Support #41

danielkhan opened this issue Nov 20, 2023 · 9 comments

Comments

@danielkhan
Copy link

danielkhan commented Nov 20, 2023

While we are moving to a world where we may process, propagate, and ingest pure OpenTelemetry (OTLP) data, we have to reconsider our trace context propagation schema and adopt W3C Trace Context fully.

Project Board

About W3C Trace Context

W3C Trace Context is a standard for trace context propagation.
It defines three headers.

  • traceparent (W3C): to capture the relationship between spans and propagate sampling decisions
  • tracestate (W3C): A vendor-prefixed key-value list to hold vendor-specific (meta) data, like tenant IDs or vendor-local trace IDs
  • baggage (W3C): A userland read- and writeable key-value list of properties that can propagate arbitrary information—E.g., for a/b tests or for propagating a customer loyalty status.

Sentry has a very similar trace propagation concept.
It defines

  • sentry-trace, which is almost similar to trace-parent, except missing some flags W3C introduced
  • baggage where we follow the spec, yet we are using it to propagate essential sentry-specific metadata like the dynamic sampling context (DSC). This can have unexpected consequences as the user can remove entries or create a new Baggage object altogether (see Baggage interface in OTel).

We are not using a concept like trace-state.

Problem Description

The problem arises in mixed scenarios where one tier might be instrumented purely with OpenTelemetry.

Mixed Trace scenario

In this example, the web frontend sends sentry trace headers to the Node middleware where no Sentry SDK or propagator is present, which leads to context loss ( a broken trace).
There is also a Python backend tier with a Sentry SDK that receives a trace-parent and needs to continue this trace.

Out Of Scope

While we are currently not using the baggage header to the spec, this problem is out-of-scope for this project.

Next steps

  • We must find a migration path that lets us first emit sentry-trace and trace-parent in parallel. This can't be turned on by default because of CORS, though
  • We also need to ensure that we can pick up an incoming trace-parent header
  • Later, we should come to a world where we fully support Trace Context and deprecate sentry-trace

A quick TL;DR: How to Implement

Add support continuing a trace from the traceparent header.
If both sentry-trace and traceparent headers are present, sentry-trace takes precedence.
On outgoing HTTP requests, attach a traceparent header next to the sentry-trace header by default. Users can opt out by setting trace_propagation_tragets to [].
The W3C traceparent header only has two sampling states; hence, deferred or not-to-sample decisions are both represented with -00.
Add a new global getW3CTraceparent() function as well as Span::toW3CTraceparent().

Get inspired by the PHP SDK PR #1 & #2 and Laravel SDK PR.

Time constraints

We will soon be able to ingest OTel traces and this turns out to be rather critical. So we should have a plan by end of Q4 and implement this in Q1.

Stakeholder(s)

Team(s)

Web-Backend, Mobile, Web-Frontend

SDK Tasks (General)

Preview Give feedback

SDK Tasks (Mobile)

Preview Give feedback
  1. Platform: Cocoa
  2. Platform: Android Platform: Java
  3. 0 of 4
    Blocked Platform: React-Native
  4. Distributed Tracing Platform: Dart performance

SDK Tasks (Web Backend)

Preview Give feedback
  1. Type: Feature
    cleptric
  2. Type: Feature
    cleptric
  3. Status: Backlog
  4. Platform: Android Platform: Java
  5. Feature
  6. Triaged
  7. tracing
    sl0thentr0py
  8. on-hold

SDK Tasks (Web Frontend)

Preview Give feedback
  1. Feature: Tracing
@cleptric
Copy link
Member

We must find a migration path that lets us first emit sentry-trace and trace-parent in parallel. This can't be turned on by default because of CORS, though

This is only a concern in FE JS, all other platforms can in theory send both headers today.

We also need to ensure that we can pick up an incoming trace-parent header

The question will be what will take precedence for the moment, likely sentry-trace.

@danielkhan
Copy link
Author

The question will be what will take precedence for the moment, likely sentry-trace

Yes, sentry-trace it is.

@krystofwoldrich
Copy link
Member

React-Native will get this from JS.

@ericsampson
Copy link

Hi! I'mvery glad to see this being worked on.

We have a large mixed OTel-and-Sentry microservice system. Is there any docs/guidance on how we can get end-to-end distributed tracing working? (doesn't have to be polished, even just links to GH issues or pointers would be helpful)

I'm also wondering what direction we should take re evolving our current setup, in terms of matching the direction that Sentry is going... Should we be adding the Sentry SDK to the systems that are currently instrumented with Otel, or the reverse and moving to a "fully-OTel but with Sentry exporter" system.

The sense I'm getting is that the Sentry OTel story is still evolving, and so it may be too early for us to move to the latter direction without experiencing a ton of "bleeding-edge paper cuts" ? But at the same time I don't really feeling like doing two migrations (go all-Sentry now and then Sentry-on-OTel later) since there are so many micro services that need touching.

The trace header interior/propagation story is kinda messy in a legacy mixed system like ours. And for more fun, we also have legacy Honeycomb usage (with their proprietary headers) in the mix for some of the services, so there are 3 standards in play 🥴

Thanks!

@danielkhan
Copy link
Author

Hi @ericsampson, the direct answer is switching to Sentry SDKs is more beneficial. We're planning to support OTLP ingest this year, but OTel's data often doesn't match Sentry's quality, particularly for errors.

I suggest using our SDKs as we're integrating OTel for its valuable instrumentation. For instance, our Node.js experimental package already utilizes OTel for auto-instrumentation while maintaining compatibility with the OTel tracing SDK and providing Sentry's error tracking.

Throughout this year, we'll introduce more SDKs that integrate OTel. If installing our SDK isn't an option, you'll still be able to send data via OTLP, which is part of the improvements we're discussing here.

@ericsampson
Copy link

thanks for the info @danielkhan !

@mydea
Copy link
Member

mydea commented May 2, 2024

Snippet to generate traceparent with the JavaScript SDKs, should work in both v7 & v8:

import * as Sentry from '@sentry/node';

function getTraceparentString() {
  const span = Sentry.getActiveSpan();
  if (!span) {
    return undefined;
  }
  return`00-${span.spanContext().traceId}-${span.spanContext().spanId}-0${span.spanContext().traceFlags}`;

(for context, traceFlags is either 0 or 1)

@brustolin
Copy link

Snippet to generate traceparent with the Cocoa SDKs.

func getTraceParent() -> String? {
        guard let span = SentrySDK.span else { return nil }
        return "00-\(span.traceId.sentryIdString)-\(span.spanId.sentrySpanIdString)-0\(span.sampled == .yes ? 1 : 0)"
    }

@markushi
Copy link
Member

markushi commented May 2, 2024

Java

public static String getTraceParent() {
  final ISpan span = Sentry.getSpan();
  if (span != null) {
    final SpanContext context = span.getSpanContext();
    return "00-" + context.getTraceId() + "-" + context.getSpanId() + "-0" +
      ((context.getSampled() != null && context.getSampled()) ? "1" : "0");
  }
  return null;
}

Kotlin

fun getTraceParent(): String? = Sentry.getSpan()?.spanContext?.let { context ->
    return "00-${context.traceId}-${context.spanId}-0${if (context.sampled == true) "1" else "0"}"
}

@stephanie-anderson stephanie-anderson changed the title W3C Trace Context support Tracing: W3C Trace Context Support Jun 18, 2024
@stephanie-anderson stephanie-anderson modified the milestones: [5] On Hold, Tracing Jul 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Needs Discussion
Development

No branches or pull requests