Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

telemetry worker: flush data after stops, and two other fixes #515

Merged
merged 6 commits into from
Dec 6, 2024

Conversation

cataphract
Copy link
Contributor

@cataphract cataphract commented Jul 1, 2024

Telemetry workers are functionally dead after a Stop lifecycle action, provided there's no intervening Start. While AddPoint actions are still processed, their data is never flushed, since the Stop action handler unschedules FlushMetrics and FlushData actions.

PHP sends a Stop action at the end of every request via ddog_sidecar_telemetry_end(), but a Start action is only generated just after a telemetry worker is spawned. With no more Start actions generated, no metrics can effectively be sent after the first Stop.

It is not clear to me whether the intention is to have a Start/Stop pair on every PHP request (where Stop flushes the metrics) or if the intention is to to have only such a pair in the first request, with the Stop event generated by ddog_sidecar_telemetry_end() effectively a noop. It would appear, judging by this
comment
:

Also allow the telemetry worker to have a mode where it's continuing
execution after a start-stop cycle, otherwise it won't send any more metrics afterwards.

that the intention is to keep sending metrics after a Start/Stop pair. It also makes more sense, insofar as data is flushed only on the interval, rather than after every request via Stop. In that case:

  • The Stop action handler should not unschedule FlushData and FlushMetrics events and
  • FlushData, if called outside a Start-Stop pair, should not be a noop.

Finally: swap the order in which FlushData and FlushMetrics are scheduled so that FlushMetrics runs first and therefore its generated data can be sent by the next FlushData.

@cataphract cataphract requested a review from a team as a code owner July 1, 2024 13:55
@cataphract cataphract requested review from pawelchcki, bwoebi and bantonsson and removed request for a team and pawelchcki July 1, 2024 13:55
@pr-commenter
Copy link

pr-commenter bot commented Jul 1, 2024

Benchmarks

Comparison

Benchmark execution time: 2024-12-06 11:15:26

Comparing candidate commit cbce086 in PR branch glopes/flush-data-after-stop with baseline commit fd011c1 in branch main.

Found 0 performance improvements and 0 performance regressions! Performance is the same for 51 metrics, 2 unstable metrics.

Candidate

Candidate benchmark details

Group 1

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz cbce086 1733483074 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... execution_time 508.601µs 509.656µs ± 1.294µs 509.512µs ± 0.324µs 509.823µs 510.362µs 512.166µs 523.816µs 2.81% 8.200 80.075 0.25% 0.091µs 1 200
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... throughput 1909066.449op/s 1962119.747op/s ± 4880.429op/s 1962661.293op/s ± 1247.720op/s 1963904.986op/s 1965364.520op/s 1965784.276op/s 1966176.445op/s 0.18% -8.088 78.315 0.25% 345.098op/s 1 200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて execution_time 465.969µs 467.288µs ± 0.590µs 467.290µs ± 0.267µs 467.533µs 468.058µs 468.452µs 471.080µs 0.81% 2.195 13.344 0.13% 0.042µs 1 200
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて throughput 2122782.879op/s 2140013.050op/s ± 2696.197op/s 2139998.716op/s ± 1223.498op/s 2141385.363op/s 2143702.016op/s 2145868.807op/s 2146064.784op/s 0.28% -2.156 13.070 0.13% 190.650op/s 1 200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters execution_time 175.838µs 176.839µs ± 0.367µs 176.844µs ± 0.209µs 177.053µs 177.490µs 177.706µs 177.768µs 0.52% -0.082 0.280 0.21% 0.026µs 1 200
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters throughput 5625319.426op/s 5654881.330op/s ± 11749.493op/s 5654704.438op/s ± 6688.364op/s 5661396.240op/s 5673873.040op/s 5684175.092op/s 5687063.110op/s 0.57% 0.096 0.287 0.21% 830.815op/s 1 200
normalization/normalize_service/normalize_service/[empty string] execution_time 45.245µs 45.508µs ± 0.114µs 45.499µs ± 0.083µs 45.598µs 45.680µs 45.747µs 45.796µs 0.65% 0.046 -0.577 0.25% 0.008µs 1 200
normalization/normalize_service/normalize_service/[empty string] throughput 21835826.179op/s 21974196.975op/s ± 55014.463op/s 21978719.144op/s ± 40221.587op/s 22012877.848op/s 22063793.950op/s 22083842.708op/s 22101813.493op/s 0.56% -0.036 -0.579 0.25% 3890.110op/s 1 200
normalization/normalize_service/normalize_service/test_ASCII execution_time 41.802µs 42.103µs ± 0.215µs 42.057µs ± 0.177µs 42.295µs 42.460µs 42.559µs 42.590µs 1.27% 0.373 -1.114 0.51% 0.015µs 1 200
normalization/normalize_service/normalize_service/test_ASCII throughput 23479576.214op/s 23752159.921op/s ± 120828.433op/s 23777411.398op/s ± 100312.863op/s 23855837.113op/s 23912385.125op/s 23920646.605op/s 23922244.385op/s 0.61% -0.361 -1.128 0.51% 8543.860op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... execution_time [509.477µs; 509.835µs] or [-0.035%; +0.035%] None None None
normalization/normalize_service/normalize_service/A0000000000000000000000000000000000000000000000000... throughput [1961443.367op/s; 1962796.128op/s] or [-0.034%; +0.034%] None None None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて execution_time [467.206µs; 467.369µs] or [-0.018%; +0.018%] None None None
normalization/normalize_service/normalize_service/Data🐨dog🐶 繋がっ⛰てて throughput [2139639.383op/s; 2140386.718op/s] or [-0.017%; +0.017%] None None None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters execution_time [176.788µs; 176.890µs] or [-0.029%; +0.029%] None None None
normalization/normalize_service/normalize_service/Test Conversion 0f Weird !@#$%^&**() Characters throughput [5653252.964op/s; 5656509.697op/s] or [-0.029%; +0.029%] None None None
normalization/normalize_service/normalize_service/[empty string] execution_time [45.492µs; 45.524µs] or [-0.035%; +0.035%] None None None
normalization/normalize_service/normalize_service/[empty string] throughput [21966572.500op/s; 21981821.451op/s] or [-0.035%; +0.035%] None None None
normalization/normalize_service/normalize_service/test_ASCII execution_time [42.073µs; 42.132µs] or [-0.071%; +0.071%] None None None
normalization/normalize_service/normalize_service/test_ASCII throughput [23735414.263op/s; 23768905.580op/s] or [-0.071%; +0.071%] None None None

Group 2

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz cbce086 1733483074 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
benching string interning on wordpress profile execution_time 142.175µs 143.124µs ± 0.377µs 143.090µs ± 0.180µs 143.268µs 143.633µs 144.183µs 146.136µs 2.13% 2.946 20.365 0.26% 0.027µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
benching string interning on wordpress profile execution_time [143.071µs; 143.176µs] or [-0.036%; +0.036%] None None None

Group 3

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz cbce086 1733483074 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
credit_card/is_card_number/ execution_time 4.617µs 4.628µs ± 0.005µs 4.628µs ± 0.003µs 4.631µs 4.638µs 4.642µs 4.643µs 0.32% 0.313 0.042 0.11% 0.000µs 1 200
credit_card/is_card_number/ throughput 215373416.243op/s 216065890.140op/s ± 242707.789op/s 216069058.911op/s ± 157021.133op/s 216244234.905op/s 216462298.391op/s 216550453.320op/s 216579289.969op/s 0.24% -0.306 0.036 0.11% 17162.032op/s 1 200
credit_card/is_card_number/ 3782-8224-6310-005 execution_time 96.302µs 96.488µs ± 0.385µs 96.453µs ± 0.033µs 96.491µs 96.558µs 96.616µs 101.845µs 5.59% 13.572 186.324 0.40% 0.027µs 1 200
credit_card/is_card_number/ 3782-8224-6310-005 throughput 9818834.015op/s 10364127.431op/s ± 39237.712op/s 10367789.869op/s ± 3555.142op/s 10370950.708op/s 10374792.368op/s 10379240.586op/s 10384035.584op/s 0.16% -13.522 185.384 0.38% 2774.525op/s 1 200
credit_card/is_card_number/ 378282246310005 execution_time 88.393µs 88.827µs ± 0.421µs 88.788µs ± 0.119µs 88.913µs 89.170µs 89.317µs 94.081µs 5.96% 9.763 119.310 0.47% 0.030µs 1 200
credit_card/is_card_number/ 378282246310005 throughput 10629191.711op/s 11258021.769op/s ± 51070.282op/s 11262827.617op/s ± 15087.989op/s 11277575.950op/s 11302223.206op/s 11311134.535op/s 11313054.133op/s 0.45% -9.396 113.248 0.45% 3611.214op/s 1 200
credit_card/is_card_number/37828224631 execution_time 4.613µs 4.627µs ± 0.005µs 4.627µs ± 0.003µs 4.630µs 4.635µs 4.641µs 4.642µs 0.32% -0.070 0.148 0.11% 0.000µs 1 200
credit_card/is_card_number/37828224631 throughput 215426726.851op/s 216114693.726op/s ± 245368.502op/s 216112210.341op/s ± 152496.877op/s 216274111.476op/s 216520341.781op/s 216735660.837op/s 216779538.820op/s 0.31% 0.077 0.147 0.11% 17350.173op/s 1 200
credit_card/is_card_number/378282246310005 execution_time 84.776µs 85.466µs ± 0.293µs 85.437µs ± 0.180µs 85.655µs 86.001µs 86.080µs 86.139µs 0.82% 0.147 -0.332 0.34% 0.021µs 1 200
credit_card/is_card_number/378282246310005 throughput 11609082.311op/s 11700703.058op/s ± 40135.826op/s 11704587.618op/s ± 24666.541op/s 11724084.494op/s 11768393.744op/s 11787931.877op/s 11795786.505op/s 0.78% -0.130 -0.334 0.34% 2838.031op/s 1 200
credit_card/is_card_number/37828224631000521389798 execution_time 64.208µs 64.312µs ± 0.073µs 64.309µs ± 0.023µs 64.331µs 64.374µs 64.412µs 65.175µs 1.35% 8.390 97.778 0.11% 0.005µs 1 200
credit_card/is_card_number/37828224631000521389798 throughput 15343188.775op/s 15549203.647op/s ± 17413.360op/s 15549992.003op/s ± 5584.246op/s 15555714.824op/s 15566495.533op/s 15571570.397op/s 15574321.059op/s 0.16% -8.291 96.241 0.11% 1231.310op/s 1 200
credit_card/is_card_number/x371413321323331 execution_time 6.834µs 6.844µs ± 0.004µs 6.844µs ± 0.002µs 6.846µs 6.850µs 6.853µs 6.857µs 0.20% 0.339 1.146 0.06% 0.000µs 1 200
credit_card/is_card_number/x371413321323331 throughput 145829534.650op/s 146114227.936op/s ± 80849.951op/s 146123555.462op/s ± 41166.393op/s 146158111.096op/s 146241197.636op/s 146314224.459op/s 146331742.468op/s 0.14% -0.334 1.142 0.06% 5716.955op/s 1 200
credit_card/is_card_number_no_luhn/ execution_time 4.612µs 4.627µs ± 0.006µs 4.628µs ± 0.004µs 4.632µs 4.638µs 4.643µs 4.644µs 0.34% 0.122 -0.326 0.13% 0.000µs 1 200
credit_card/is_card_number_no_luhn/ throughput 215352618.737op/s 216106487.418op/s ± 286175.190op/s 216092339.532op/s ± 201224.965op/s 216324420.376op/s 216520642.261op/s 216699105.853op/s 216807703.706op/s 0.33% -0.115 -0.330 0.13% 20235.642op/s 1 200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 execution_time 76.866µs 77.341µs ± 0.140µs 77.336µs ± 0.101µs 77.442µs 77.558µs 77.630µs 77.666µs 0.43% -0.274 -0.058 0.18% 0.010µs 1 200
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 throughput 12875680.158op/s 12929773.675op/s ± 23368.387op/s 12930570.594op/s ± 16924.148op/s 12946691.996op/s 12969650.953op/s 12983313.766op/s 13009653.163op/s 0.61% 0.284 -0.044 0.18% 1652.394op/s 1 200
credit_card/is_card_number_no_luhn/ 378282246310005 execution_time 68.882µs 69.213µs ± 0.134µs 69.207µs ± 0.093µs 69.307µs 69.445µs 69.493µs 69.513µs 0.44% 0.058 -0.366 0.19% 0.009µs 1 200
credit_card/is_card_number_no_luhn/ 378282246310005 throughput 14385792.631op/s 14448169.844op/s ± 27902.416op/s 14449496.841op/s ± 19418.864op/s 14467754.993op/s 14489136.048op/s 14514778.603op/s 14517676.723op/s 0.47% -0.048 -0.363 0.19% 1972.999op/s 1 200
credit_card/is_card_number_no_luhn/37828224631 execution_time 4.614µs 4.628µs ± 0.005µs 4.628µs ± 0.003µs 4.632µs 4.638µs 4.640µs 4.643µs 0.33% 0.024 -0.064 0.12% 0.000µs 1 200
credit_card/is_card_number_no_luhn/37828224631 throughput 215356674.901op/s 216072465.202op/s ± 255304.330op/s 216073833.459op/s ± 162876.412op/s 216232588.430op/s 216495711.068op/s 216667019.655op/s 216722100.790op/s 0.30% -0.017 -0.065 0.12% 18052.742op/s 1 200
credit_card/is_card_number_no_luhn/378282246310005 execution_time 65.092µs 65.502µs ± 0.153µs 65.500µs ± 0.113µs 65.622µs 65.754µs 65.860µs 65.925µs 0.65% 0.085 -0.315 0.23% 0.011µs 1 200
credit_card/is_card_number_no_luhn/378282246310005 throughput 15168788.463op/s 15266684.031op/s ± 35686.812op/s 15267290.601op/s ± 26383.661op/s 15292411.911op/s 15320918.571op/s 15345738.665op/s 15362977.453op/s 0.63% -0.073 -0.318 0.23% 2523.439op/s 1 200
credit_card/is_card_number_no_luhn/37828224631000521389798 execution_time 64.215µs 64.310µs ± 0.032µs 64.307µs ± 0.017µs 64.327µs 64.362µs 64.396µs 64.411µs 0.16% 0.113 1.365 0.05% 0.002µs 1 200
credit_card/is_card_number_no_luhn/37828224631000521389798 throughput 15525397.361op/s 15549686.751op/s ± 7767.456op/s 15550408.193op/s ± 4139.541op/s 15553764.879op/s 15564096.664op/s 15568658.238op/s 15572620.219op/s 0.14% -0.108 1.363 0.05% 549.242op/s 1 200
credit_card/is_card_number_no_luhn/x371413321323331 execution_time 6.833µs 6.843µs ± 0.003µs 6.843µs ± 0.002µs 6.845µs 6.849µs 6.852µs 6.854µs 0.16% 0.193 1.165 0.05% 0.000µs 1 200
credit_card/is_card_number_no_luhn/x371413321323331 throughput 145895326.789op/s 146126057.905op/s ± 66257.699op/s 146131705.348op/s ± 34805.927op/s 146163404.338op/s 146225972.657op/s 146290998.123op/s 146344822.837op/s 0.15% -0.189 1.164 0.05% 4685.127op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
credit_card/is_card_number/ execution_time [4.628µs; 4.629µs] or [-0.016%; +0.016%] None None None
credit_card/is_card_number/ throughput [216032253.175op/s; 216099527.105op/s] or [-0.016%; +0.016%] None None None
credit_card/is_card_number/ 3782-8224-6310-005 execution_time [96.435µs; 96.541µs] or [-0.055%; +0.055%] None None None
credit_card/is_card_number/ 3782-8224-6310-005 throughput [10358689.461op/s; 10369565.400op/s] or [-0.052%; +0.052%] None None None
credit_card/is_card_number/ 378282246310005 execution_time [88.769µs; 88.886µs] or [-0.066%; +0.066%] None None None
credit_card/is_card_number/ 378282246310005 throughput [11250943.919op/s; 11265099.619op/s] or [-0.063%; +0.063%] None None None
credit_card/is_card_number/37828224631 execution_time [4.626µs; 4.628µs] or [-0.016%; +0.016%] None None None
credit_card/is_card_number/37828224631 throughput [216080688.012op/s; 216148699.441op/s] or [-0.016%; +0.016%] None None None
credit_card/is_card_number/378282246310005 execution_time [85.425µs; 85.507µs] or [-0.048%; +0.048%] None None None
credit_card/is_card_number/378282246310005 throughput [11695140.619op/s; 11706265.498op/s] or [-0.048%; +0.048%] None None None
credit_card/is_card_number/37828224631000521389798 execution_time [64.302µs; 64.322µs] or [-0.016%; +0.016%] None None None
credit_card/is_card_number/37828224631000521389798 throughput [15546790.323op/s; 15551616.971op/s] or [-0.016%; +0.016%] None None None
credit_card/is_card_number/x371413321323331 execution_time [6.843µs; 6.844µs] or [-0.008%; +0.008%] None None None
credit_card/is_card_number/x371413321323331 throughput [146103022.910op/s; 146125432.961op/s] or [-0.008%; +0.008%] None None None
credit_card/is_card_number_no_luhn/ execution_time [4.627µs; 4.628µs] or [-0.018%; +0.018%] None None None
credit_card/is_card_number_no_luhn/ throughput [216066826.289op/s; 216146148.547op/s] or [-0.018%; +0.018%] None None None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 execution_time [77.322µs; 77.360µs] or [-0.025%; +0.025%] None None None
credit_card/is_card_number_no_luhn/ 3782-8224-6310-005 throughput [12926535.041op/s; 12933012.309op/s] or [-0.025%; +0.025%] None None None
credit_card/is_card_number_no_luhn/ 378282246310005 execution_time [69.195µs; 69.232µs] or [-0.027%; +0.027%] None None None
credit_card/is_card_number_no_luhn/ 378282246310005 throughput [14444302.837op/s; 14452036.850op/s] or [-0.027%; +0.027%] None None None
credit_card/is_card_number_no_luhn/37828224631 execution_time [4.627µs; 4.629µs] or [-0.016%; +0.016%] None None None
credit_card/is_card_number_no_luhn/37828224631 throughput [216037082.477op/s; 216107847.926op/s] or [-0.016%; +0.016%] None None None
credit_card/is_card_number_no_luhn/378282246310005 execution_time [65.481µs; 65.524µs] or [-0.032%; +0.032%] None None None
credit_card/is_card_number_no_luhn/378282246310005 throughput [15261738.182op/s; 15271629.880op/s] or [-0.032%; +0.032%] None None None
credit_card/is_card_number_no_luhn/37828224631000521389798 execution_time [64.306µs; 64.314µs] or [-0.007%; +0.007%] None None None
credit_card/is_card_number_no_luhn/37828224631000521389798 throughput [15548610.256op/s; 15550763.245op/s] or [-0.007%; +0.007%] None None None
credit_card/is_card_number_no_luhn/x371413321323331 execution_time [6.843µs; 6.844µs] or [-0.006%; +0.006%] None None None
credit_card/is_card_number_no_luhn/x371413321323331 throughput [146116875.225op/s; 146135240.584op/s] or [-0.006%; +0.006%] None None None

Group 4

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz cbce086 1733483074 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
benching deserializing traces from msgpack to their internal representation execution_time 58.555ms 58.830ms ± 0.193ms 58.755ms ± 0.099ms 58.940ms 59.231ms 59.358ms 59.566ms 1.38% 1.159 0.827 0.33% 0.014ms 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
benching deserializing traces from msgpack to their internal representation execution_time [58.804ms; 58.857ms] or [-0.045%; +0.045%] None None None

Group 5

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz cbce086 1733483074 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
sql/obfuscate_sql_string execution_time 72.995µs 73.282µs ± 0.541µs 73.219µs ± 0.067µs 73.305µs 73.433µs 73.862µs 80.679µs 10.19% 12.877 173.112 0.74% 0.038µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
sql/obfuscate_sql_string execution_time [73.207µs; 73.357µs] or [-0.102%; +0.102%] None None None

Group 6

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz cbce086 1733483074 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... execution_time 280.101µs 280.710µs ± 0.360µs 280.656µs ± 0.163µs 280.850µs 281.200µs 282.022µs 283.048µs 0.85% 2.444 11.137 0.13% 0.025µs 1 200
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... throughput 3532967.091op/s 3562395.875op/s ± 4555.771op/s 3563081.220op/s ± 2073.645op/s 3564835.492op/s 3567812.032op/s 3569354.587op/s 3570135.496op/s 0.20% -2.416 10.935 0.13% 322.142op/s 1 200
normalization/normalize_name/normalize_name/bad-name execution_time 27.228µs 27.328µs ± 0.126µs 27.311µs ± 0.029µs 27.345µs 27.409µs 27.473µs 28.968µs 6.07% 11.034 140.159 0.46% 0.009µs 1 200
normalization/normalize_name/normalize_name/bad-name throughput 34520305.830op/s 36592691.341op/s ± 161172.369op/s 36615704.580op/s ± 38410.089op/s 36649640.017op/s 36694722.520op/s 36702717.232op/s 36726667.548op/s 0.30% -10.751 135.174 0.44% 11396.607op/s 1 200
normalization/normalize_name/normalize_name/good execution_time 16.941µs 17.002µs ± 0.026µs 17.002µs ± 0.016µs 17.017µs 17.052µs 17.064µs 17.092µs 0.53% 0.323 0.218 0.16% 0.002µs 1 200
normalization/normalize_name/normalize_name/good throughput 58507898.566op/s 58815866.559op/s ± 91590.301op/s 58817467.280op/s ± 54938.058op/s 58875596.871op/s 58958198.100op/s 59008665.019op/s 59028571.342op/s 0.36% -0.314 0.207 0.16% 6476.412op/s 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... execution_time [280.661µs; 280.760µs] or [-0.018%; +0.018%] None None None
normalization/normalize_name/normalize_name/Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Long-.Too-Lo... throughput [3561764.489op/s; 3563027.261op/s] or [-0.018%; +0.018%] None None None
normalization/normalize_name/normalize_name/bad-name execution_time [27.311µs; 27.346µs] or [-0.064%; +0.064%] None None None
normalization/normalize_name/normalize_name/bad-name throughput [36570354.401op/s; 36615028.281op/s] or [-0.061%; +0.061%] None None None
normalization/normalize_name/normalize_name/good execution_time [16.999µs; 17.006µs] or [-0.022%; +0.022%] None None None
normalization/normalize_name/normalize_name/good throughput [58803173.024op/s; 58828560.093op/s] or [-0.022%; +0.022%] None None None

Group 7

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz cbce086 1733483074 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
write only interface execution_time 1.380µs 3.274µs ± 1.470µs 3.114µs ± 0.023µs 3.135µs 3.174µs 14.435µs 15.514µs 398.24% 7.629 58.135 44.80% 0.104µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
write only interface execution_time [3.070µs; 3.477µs] or [-6.225%; +6.225%] None None None

Group 8

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz cbce086 1733483074 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
tags/replace_trace_tags execution_time 2.673µs 2.721µs ± 0.017µs 2.723µs ± 0.006µs 2.728µs 2.751µs 2.757µs 2.761µs 1.40% -0.614 1.254 0.62% 0.001µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
tags/replace_trace_tags execution_time [2.719µs; 2.723µs] or [-0.086%; +0.086%] None None None

Group 9

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz cbce086 1733483074 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
two way interface execution_time 18.417µs 24.490µs ± 13.669µs 18.663µs ± 0.079µs 19.339µs 46.330µs 47.310µs 150.213µs 704.85% 4.934 37.191 55.67% 0.967µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
two way interface execution_time [22.596µs; 26.384µs] or [-7.735%; +7.735%] None None None

Group 10

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz cbce086 1733483074 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
concentrator/add_spans_to_concentrator execution_time 9.165ms 9.191ms ± 0.015ms 9.189ms ± 0.008ms 9.196ms 9.209ms 9.239ms 9.312ms 1.34% 3.501 23.476 0.16% 0.001ms 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
concentrator/add_spans_to_concentrator execution_time [9.188ms; 9.193ms] or [-0.023%; +0.023%] None None None

Group 11

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz cbce086 1733483074 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
redis/obfuscate_redis_string execution_time 35.723µs 36.441µs ± 1.291µs 35.852µs ± 0.028µs 35.900µs 39.219µs 39.278µs 40.330µs 12.49% 1.724 1.026 3.53% 0.091µs 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
redis/obfuscate_redis_string execution_time [36.262µs; 36.620µs] or [-0.491%; +0.491%] None None None

Group 12

cpu_model git_commit_sha git_commit_date git_branch
Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz cbce086 1733483074 glopes/flush-data-after-stop
scenario metric min mean ± sd median ± mad p75 p95 p99 max peak_to_median_ratio skewness kurtosis cv sem runs sample_size
normalization/normalize_trace/test_trace execution_time 297.262ns 308.787ns ± 13.613ns 303.626ns ± 3.600ns 307.699ns 339.722ns 351.041ns 351.903ns 15.90% 1.798 2.147 4.40% 0.963ns 1 200
scenario metric 95% CI mean Shapiro-Wilk pvalue Ljung-Box pvalue (lag=1) Dip test pvalue
normalization/normalize_trace/test_trace execution_time [306.900ns; 310.673ns] or [-0.611%; +0.611%] None None None

Baseline

Omitted due to size.

Copy link
Contributor

@bantonsson bantonsson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These booleans are woefully undocumented and I'm not completely sure about the expected life cycle, but from reading the code it looks like your interpretation is correct.

I think that it might be better to leave the worker as started if it's restartable and leave the checks in place.

@@ -296,7 +293,9 @@ impl TelemetryWorker {
self.log_err(&e);
}
self.data.started = false;
self.deadlines.clear_pending();
if !self.config.restartable {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be enough to only include self.data.started = false; inside the if statement as well, and leave the exit early checks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it depends. Do we want Start, Stop, Stop to generate two stops? Because that's what PHP ends up generating. The second stop is a noop, but if I moved the assignment self.data.started = false under the condition if !self.config.restartable, then the stops would be effective

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the exit early checks are still in the code, then how would the second Stop be effective?

Copy link
Contributor Author

@cataphract cataphract Jul 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand, what you're proposing is that I reset started = false only if !restartable. In that case started would stay true forever once there is a start. So the early check

 if !self.data.started {
   return BREAK;
 }

would never be hit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. I was confusing the early checks.

There is something in the logic that feels a bit broken. The FlushData is also protected by this !self.data.started check. Should it work after a restartable stop?

The things that Stop does, should they happen when the first request ends, or when the worker stops?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should [FlushData] work after a restartable stop?

My guess is yes, otherwise there is no way to send the data that's collected after the stop (build_observability_batch is called only from the handlers of Stop and FlushData), at least not without an intervening start+stop.

The things that Stop does, should they happen when the first request ends, or when the worker stops?

That is a good point. The final flush of the metrics should happen when the worker stops, not when handling Stop. I guess at some point they were the same, but then the restart thing was introduced. But regardless, once we have a way to send metrics after a Stop, for that happen periodic flushes should still happen. So FlushData shouldn't be skipped or unscheduled after a Stop.

ddtelemetry/src/worker/mod.rs Outdated Show resolved Hide resolved
@cataphract cataphract force-pushed the glopes/flush-data-after-stop branch from 4f63a34 to 251bafa Compare July 4, 2024 15:39
@codecov-commenter
Copy link

codecov-commenter commented Jul 4, 2024

Codecov Report

Attention: Patch coverage is 72.50000% with 11 lines in your changes missing coverage. Please review.

Project coverage is 71.12%. Comparing base (fd011c1) to head (cbce086).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #515      +/-   ##
==========================================
+ Coverage   71.08%   71.12%   +0.04%     
==========================================
  Files         301      301              
  Lines       44528    44533       +5     
==========================================
+ Hits        31651    31673      +22     
+ Misses      12877    12860      -17     
Components Coverage Δ
crashtracker 44.89% <ø> (ø)
crashtracker-ffi 8.41% <ø> (ø)
datadog-alloc 98.73% <ø> (ø)
data-pipeline 90.86% <ø> (ø)
data-pipeline-ffi 86.29% <ø> (ø)
ddcommon 83.28% <ø> (ø)
ddcommon-ffi 68.68% <ø> (ø)
ddtelemetry 59.47% <87.87%> (+0.40%) ⬆️
ddtelemetry-ffi 22.13% <ø> (ø)
dogstatsd 89.45% <ø> (ø)
dogstatsd-client 79.77% <ø> (ø)
ipc 82.86% <ø> (+0.10%) ⬆️
profiling 84.30% <ø> (ø)
profiling-ffi 77.46% <ø> (ø)
serverless 0.00% <ø> (ø)
sidecar 37.96% <0.00%> (-0.06%) ⬇️
sidecar-ffi 0.00% <ø> (ø)
spawn-worker 50.36% <ø> (ø)
tinybytes 94.47% <ø> (ø)
trace-mini-agent 72.38% <ø> (ø)
trace-normalization 98.23% <ø> (ø)
trace-obfuscation 95.77% <ø> (ø)
trace-protobuf 77.67% <ø> (ø)
trace-utils 93.03% <ø> (ø)

@cataphract cataphract force-pushed the glopes/flush-data-after-stop branch 4 times, most recently from 447c409 to a7d11b0 Compare July 12, 2024 16:03
@cataphract cataphract requested review from a team as code owners July 12, 2024 16:03
@cataphract cataphract force-pushed the glopes/flush-data-after-stop branch 2 times, most recently from 9c34541 to e458f75 Compare October 15, 2024 11:47
@cataphract
Copy link
Contributor Author

@bwoebi Can you look at this? I need it for DataDog/dd-trace-php#2735

@cataphract cataphract force-pushed the glopes/flush-data-after-stop branch from e458f75 to b018033 Compare November 11, 2024 11:35
Telemetry workers are functionally dead after a Stop lifecycle action,
provided there's no intervening Start. While AddPoint actions are still
processed, their data is never flushed, since the Stop action handler
unschedules FlushMetrics and FlushData actions.

PHP sends a Stop action at the end of every request via
ddog_sidecar_telemetry_end(), but a Start action is only generated just
after a telemetry worker is spawned.

It is not clear to me whether the intention is to a Start/Stop pair on
every PHP requests (where Stop flushes the metrics) or if the intention
is to to have only such a pair in the first request, with the Stop event
generated by ddog_sidecar_telemetry_end() effectively a noop. It would
appear, judging by [this
comment](#391):

> Also allow the telemetry worker to have a mode where it's continuing
execution after a start-stop cycle, otherwise it won't send any more
metrics afterwards.

that the intention is to keep sending metrics after a Start/Stop pair.
In that case:

* The Stop action handler should not unschedule FlushData and
  FlushMetrics events and
* FlushData, if called outside a Start-Stop pair, should not be a noop.

Finally: swap the order in which FlushData and FlushMetrics are
scheduled so that FlushMetrics runs first and therefore its generated
data can be sent by the next FlushData.
EnqueuedTelemetryData has actions stored directly in an "actions" field,
but also dependencies, configurations and integrations in separate
fields; extract_telemetry_actions() converts from the latter into proper
TelemetryActions.

However, this method was being called only from a branch of
register_service_and_flush_queued_actions() that would be hit when
RuntimeInfo::apps does not already contain the key corresponding to the
(service, env) in question (so, essentially, deps/confs/integrations
would be lost on all but the first call).
@cataphract cataphract force-pushed the glopes/flush-data-after-stop branch from b018033 to b439363 Compare December 5, 2024 12:30
@cataphract cataphract requested a review from a team as a code owner December 5, 2024 12:30
@cataphract
Copy link
Contributor Author

cataphract commented Dec 5, 2024

@cataphract cataphract changed the title telemetry worker: flush data after stops telemetry worker: flush data after stops, and two other fixes Dec 5, 2024
ddtelemetry/src/worker/mod.rs Show resolved Hide resolved
let mut payloads = Vec::new();

if self.data.dependencies.flush_not_empty() {
payloads.push(data::Payload::AppDependenciesLoaded(
data::AppDependenciesLoaded {
dependencies: self.data.dependencies.unflushed().cloned().collect(),
dependencies: self.data.dependencies.drain_unflushed().cloned().collect(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't these already be removed on a successful send by the call to removed_flushed? Yeah the method name has a d too much.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a new commit that uses the existing method.

@Anilm3 Anilm3 requested a review from paullegranddc December 5, 2024 13:45
@cataphract cataphract force-pushed the glopes/flush-data-after-stop branch from 25bdce5 to 67da02b Compare December 5, 2024 18:34
async move {
// This is different from the non-functional:
// match self_arc.read().await.send_payload(&payload).await { ... }
// presumably because the temp read guard would live till end of match
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting find

@cataphract cataphract enabled auto-merge (rebase) December 6, 2024 11:04
@cataphract cataphract merged commit 24f49f5 into main Dec 6, 2024
43 checks passed
@cataphract cataphract deleted the glopes/flush-data-after-stop branch December 6, 2024 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants