-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
telemetry worker: flush data after stops, and two other fixes #515
Merged
Merged
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
363d27d
telemetry worker: flush data after stops
cataphract b80b607
Fix configurations, integrations, deps being relayed over and over
cataphract b439363
Fix actions not being relayed in some circumstances
cataphract 6b45e4a
Alternative method to prevent repetition of confs, integrations, deps
cataphract 67da02b
Add comment
cataphract cbce086
Merge branch 'main' into glopes/flush-data-after-stop
cataphract File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -241,10 +241,10 @@ impl TelemetryWorker { | |
Lifecycle(Start) => { | ||
if !self.data.started { | ||
self.deadlines | ||
.schedule_event(LifecycleAction::FlushData) | ||
.schedule_event(LifecycleAction::FlushMetricAggr) | ||
.unwrap(); | ||
self.deadlines | ||
.schedule_event(LifecycleAction::FlushMetricAggr) | ||
.schedule_event(LifecycleAction::FlushData) | ||
.unwrap(); | ||
self.data.started = true; | ||
} | ||
|
@@ -265,7 +265,7 @@ impl TelemetryWorker { | |
.unwrap(); | ||
} | ||
Lifecycle(FlushData) => { | ||
if !self.data.started { | ||
if !(self.data.started || self.config.restartable) { | ||
return CONTINUE; | ||
} | ||
let batch = self.build_observability_batch(); | ||
|
@@ -296,7 +296,9 @@ impl TelemetryWorker { | |
self.log_err(&e); | ||
} | ||
self.data.started = false; | ||
self.deadlines.clear_pending(); | ||
if !self.config.restartable { | ||
self.deadlines.clear_pending(); | ||
} | ||
return BREAK; | ||
} | ||
CollectStats(stats_sender) => { | ||
|
@@ -341,10 +343,11 @@ impl TelemetryWorker { | |
Err(err) => self.log_err(&err), | ||
} | ||
self.deadlines | ||
.schedule_event(LifecycleAction::FlushData) | ||
.schedule_event(LifecycleAction::FlushMetricAggr) | ||
cataphract marked this conversation as resolved.
Show resolved
Hide resolved
|
||
.unwrap(); | ||
// flush data should be last to previously flushed metrics are sent | ||
self.deadlines | ||
.schedule_event(LifecycleAction::FlushMetricAggr) | ||
.schedule_event(LifecycleAction::FlushData) | ||
.unwrap(); | ||
self.data.started = true; | ||
} | ||
|
@@ -368,7 +371,7 @@ impl TelemetryWorker { | |
.unwrap(); | ||
} | ||
Lifecycle(FlushData) => { | ||
if !self.data.started { | ||
if !(self.data.started || self.config.restartable) { | ||
return CONTINUE; | ||
} | ||
let mut batch = self.build_app_events_batch(); | ||
|
@@ -427,38 +430,36 @@ impl TelemetryWorker { | |
|
||
let obsevability_events = self.build_observability_batch(); | ||
|
||
future::join_all( | ||
[ | ||
Some(self.build_request(&data::Payload::MessageBatch(app_events))), | ||
if obsevability_events.is_empty() { | ||
None | ||
} else { | ||
Some( | ||
self.build_request(&data::Payload::MessageBatch( | ||
obsevability_events, | ||
)), | ||
) | ||
}, | ||
] | ||
.into_iter() | ||
.flatten() | ||
.filter_map(|r| match r { | ||
Ok(r) => Some(r), | ||
Err(e) => { | ||
self.log_err(&e); | ||
None | ||
} | ||
}) | ||
.map(|r| async { | ||
if let Err(e) = self.send_request(r).await { | ||
self.log_err(&e); | ||
let mut payloads = vec![data::Payload::MessageBatch(app_events)]; | ||
if !obsevability_events.is_empty() { | ||
payloads.push(data::Payload::MessageBatch(obsevability_events)); | ||
} | ||
|
||
let self_arc = Arc::new(tokio::sync::RwLock::new(self)); | ||
let futures = payloads.into_iter().map(|payload| { | ||
let self_arc = self_arc.clone(); | ||
async move { | ||
// This is different from the non-functional: | ||
// match self_arc.read().await.send_payload(&payload).await { ... } | ||
// presumably because the temp read guard would live till end of match | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting find |
||
let res = { | ||
let self_rguard = self_arc.read().await; | ||
self_rguard.send_payload(&payload).await | ||
}; | ||
match res { | ||
Ok(()) => self_arc.write().await.payload_sent_success(&payload), | ||
Err(err) => self_arc.read().await.log_err(&err), | ||
} | ||
}), | ||
) | ||
.await; | ||
} | ||
}); | ||
future::join_all(futures).await; | ||
|
||
let mut self_lock = self_arc.write().await; | ||
self_lock.data.started = false; | ||
if !self_lock.config.restartable { | ||
self_lock.deadlines.clear_pending(); | ||
} | ||
|
||
self.data.started = false; | ||
self.deadlines.clear_pending(); | ||
return BREAK; | ||
} | ||
CollectStats(stats_sender) => { | ||
|
@@ -470,7 +471,7 @@ impl TelemetryWorker { | |
} | ||
|
||
// Builds telemetry payloads containing lifecycle events | ||
fn build_app_events_batch(&self) -> Vec<Payload> { | ||
fn build_app_events_batch(&mut self) -> Vec<Payload> { | ||
let mut payloads = Vec::new(); | ||
|
||
if self.data.dependencies.flush_not_empty() { | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be enough to only include
self.data.started = false;
inside the if statement as well, and leave the exit early checks?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it depends. Do we want Start, Stop, Stop to generate two stops? Because that's what PHP ends up generating. The second stop is a noop, but if I moved the assignment
self.data.started = false
under the conditionif !self.config.restartable
, then the stops would be effectiveThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the exit early checks are still in the code, then how would the second
Stop
be effective?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand, what you're proposing is that I reset
started = false
only if!restartable
. In that casestarted
would stay true forever once there is a start. So the early checkwould never be hit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you're right. I was confusing the early checks.
There is something in the logic that feels a bit broken. The
FlushData
is also protected by this!self.data.started
check. Should it work after a restartable stop?The things that
Stop
does, should they happen when the first request ends, or when the worker stops?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My guess is yes, otherwise there is no way to send the data that's collected after the stop (
build_observability_batch
is called only from the handlers of Stop and FlushData), at least not without an intervening start+stop.That is a good point. The final flush of the metrics should happen when the worker stops, not when handling Stop. I guess at some point they were the same, but then the restart thing was introduced. But regardless, once we have a way to send metrics after a Stop, for that happen periodic flushes should still happen. So FlushData shouldn't be skipped or unscheduled after a Stop.