Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When does the "Incorrect span sent" error occur? #235

Closed
mslaughter opened this issue Oct 26, 2023 · 5 comments
Closed

When does the "Incorrect span sent" error occur? #235

mslaughter opened this issue Oct 26, 2023 · 5 comments
Labels
type: discussion Requests for comments, discussions about possible enhancements.

Comments

@mslaughter
Copy link

Hi!

My company is using Honeycomb with ActiveJob to track job spans. We've recently run into an issue where a certain job is fairly consistently raising an Incorrect span sent error, and I'm curious under what situations this error comes up.

The code snippet from our override of ActiveJob's perform_now:

def perform_now
    Honeycomb.start_span(name: "jobs.#{self.class.name}") do |span|
      # application code
      add_honeycomb_fields(self, span)
    end
end

The error comes right at the beginning in Honeycomb.start_span(name: "jobs.#{self.class.name}") and traces back to this line in the honeycomb library.

Would appreciate any insight or pointers on where I should look next! Also let me know if there's any additional context I can provide that would help.

@mslaughter mslaughter added the type: discussion Requests for comments, discussions about possible enhancements. label Oct 26, 2023
@robbkidd robbkidd added the status: oncall Flagged for awareness from Honeycomb Telemetry Oncall label Oct 30, 2023
@rpeng
Copy link

rpeng commented Nov 1, 2023

We were able to come up with a way to reproduce this. The error happens when initialization is interrupted for nested spans - after current_span is updated and before the children list is updated

Here's some code that repros the issue in honeycomb-beeline v2.8.1:

class Honeycomb::Span
    def create_child
      self.class.new(trace: trace,
                     builder: builder,
                     context: context,
                     parent: self,
                     parent_id: id,
                     sample_hook: sample_hook,
                     presend_hook: presend_hook,
                     propagation_hook: propagation_hook).tap do |c|
        # adding a sleep to make the repro more reliable
        sleep(2)
        children << c
      end
    end
end


a = Thread.new do
    Honeycomb.start_span(name: "outer-span") do |span|
        Honeycomb.start_span(name: "inner-span") do |span|
        end
    end
end

sleep(1)
a.raise Exception

@robbkidd
Copy link
Member

a certain job is fairly consistently raising an Incorrect span sent error

Can you say more about "fairly consistently"? Does the error occur on every execution of the job? Every time to job fails? On a consistent time interval?

a = Thread.new do..end
sleep(1)
a.raise Exception

Yes, one way that the latest span in the context (spans.last) would not match the span being queued for transmission (which would generate the Incorrect span sent error) is if an exception is injected from outside the thread's execution, interrupting the management of context state.

@robbkidd robbkidd removed the status: oncall Flagged for awareness from Honeycomb Telemetry Oncall label Nov 28, 2023
@mslaughter
Copy link
Author

mslaughter commented Nov 28, 2023

a certain job is fairly consistently raising an Incorrect span sent error

We've seen it often during shutdown routines, e.g. when sidekiq raises an exception to all of its threads.

@robbkidd
Copy link
Member

Ah, I see. Yes, any time exceptions are injected from outside a thread, the risk increases for timing problems. The Beeline maintains state for a trace to control the timing of sends and to apply trace-level fields. Exceptions injected from outside of the current thread introduce race conditions for the maintenance of this state. Handling injected exceptions gets tricky—some maintainers of threaded Ruby processes believe it is futile to try—so I cannot currently picture any changes we would make to the Beeline to address this that wouldn't also break backwards compatibility.

Have you considered migrating off the Beeline and onto OpenTelemetry Ruby?

@mslaughter
Copy link
Author

Gotcha, thanks for the reply! I believe we've determined that the error is benign for our side, and so we're not urgently trying to fix it anymore.

Will take a look at OpenTelemetry Ruby – thanks for the rec!

(also not sure the etiquette here, but I closed this thread since my question has been answered!)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: discussion Requests for comments, discussions about possible enhancements.
Projects
None yet
Development

No branches or pull requests

3 participants