Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trace-level fields don't get "cleaned" for header serialization #231

Closed
ajvondrak opened this issue Sep 8, 2023 · 2 comments · Fixed by #232
Closed

Trace-level fields don't get "cleaned" for header serialization #231

ajvondrak opened this issue Sep 8, 2023 · 2 comments · Fixed by #232
Labels
type: bug Something isn't working

Comments

@ajvondrak
Copy link
Contributor

Noticed an inconsistency between the data that gets sent by Libhoney on the transmission layer vs what gets serialized into the trace header.

You can add an invalid utf-8 string to a libhoney event, and it is saved directly in a hash table: https://github.com/honeycombio/libhoney-rb/blob/a2552e822d0bf27e3d599e6e50668e9983e05b5e/lib/libhoney/event.rb#L61-L88

When the event is sent by the default transmission, that data gets munged on its way to being JSON-encoded by Libhoney::Cleaner: https://github.com/honeycombio/libhoney-rb/blob/a2552e822d0bf27e3d599e6e50668e9983e05b5e/lib/libhoney/transmission.rb#L266 -> https://github.com/honeycombio/libhoney-rb/blob/a2552e822d0bf27e3d599e6e50668e9983e05b5e/lib/libhoney/cleaner.rb

However, when you call Honeycomb::Span#to_trace_header, the "raw" trace-level fields are taken from the Honeycomb::Trace#fields and passed directly into JSON.generate:

context = Base64.urlsafe_encode64(JSON.generate(trace.fields)).strip

Without the Libhoney::Cleaner acting on the data, that means the trace header serialization blows up with an error on improperly-encoded strings, whereas the span itself can be sent to Honeycomb just fine.

Versions

  • Ruby: ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-darwin22]
  • Beeline: honeycomb-beeline (2.11.0)

Steps to reproduce

require 'honeycomb-beeline'

Honeycomb.configure { |config| config.debug = true }

span = Honeycomb.start_span(name: 'span')
span.add_trace_field('invalid_utf8', "\xAE")

begin
  span.send
  puts "Sending works fine, due to Libhoney::Cleaner#clean_string"
rescue => e
  puts "Actually, Honeycomb::Span#send didn't work: #{e}"
end

begin
  header = span.to_trace_header
  puts "Actually, Honeycomb::Span#to_trace_header worked: #{header}"
rescue => e
  puts "Honeycomb::Span#to_trace_header blew up, because the trace-level fields don't get cleaned: #{e}"
end
$ ruby /tmp/repro.rb
{"meta.beeline_version":"2.11.0","meta.local_hostname":"Alexs-MacBook-Pro.local","meta.instrumentations_count":10,"meta.instrumentations":"[\"active_support\", \"aws\", \"faraday\", \"rack\", \"rails\", \"railtie\", \"rake\", \"redis\", \"sequel\", \"sinatra\"]","service_name":null,"service.name":null,"name":"span","duration_ms":0.05299970507621765,"trace.trace_id":"957ed21fa9cd74f2808e11088f080cea","trace.span_id":"ec9ad6134bd1b0a2","meta.span_type":"root","invalid_utf8":"�"}
Sending works fine, due to Libhoney::Cleaner#clean_string
Honeycomb::Span#to_trace_header blew up, because the trace-level fields don't get cleaned: source sequence is illegal/malformed utf-8

Additional context

We were noticing this behavior in particular because we add the User-Agent header as a trace-level field. Spans normally get sent fine: the User-Agent field gets mangled with � characters (as can be seen in the repro). But on code paths where we make distributed calls and try to inject the trace header on outbound requests, we were triggering a hard error from #to_trace_header that interrupted the whole request cycle.

@ajvondrak ajvondrak added the type: bug Something isn't working label Sep 8, 2023
@robbkidd robbkidd added the status: oncall Flagged for awareness from Honeycomb Telemetry Oncall label Sep 11, 2023
@vreynolds
Copy link
Contributor

Thanks for the thorough write up @ajvondrak ❤ We'll keep it in the queue, but will welcome a PR if you got one cooking.

@vreynolds vreynolds removed the status: oncall Flagged for awareness from Honeycomb Telemetry Oncall label Sep 25, 2023
@robbkidd
Copy link
Member

robbkidd commented Oct 2, 2023

@ajvondrak I gave the problem a tug and out came #232.

JamieDanielson pushed a commit that referenced this issue Oct 11, 2023
…gation header (#232)

## Which problem is this PR solving?

- Resolves #231 

## Short description of the changes

### Tests to Demonstrate the Problem

* Tests added to cover invalid UTF8 characters in trace field values for
classic and E&S modes of Honeycomb propagation

### MarshalTraceContext

* Replace invalid UTF8 characters when serializing in Honeycomb-flavored
propagators

Mixes in the data cleaning logic already present in Libhoney::Cleaner to
prepare trace data for JSON serialization.

* Refactor the encoding logic for serialization in an extracted method
to avoid having the two implementations diverge.

### UnmarshalTraceContext

* Decode trace context with the decoder that matches the encoder.

Testing revealed that decoding trace context would fail to produce a
string parseable as JSON if some UTF8 characters were included in the
data, even if those characters are valid. Because we encode with
urlsafe, we need to decode with urlsafe.

### Linting

* Disable the Style/AccessModifierDeclarations Rubocop rule.

Sometimes we group methods under "private", sometimes we inline a symbol
reference to a method with "module_method". We remain flexible. This
rule does not serve us and we do not serve it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
3 participants