Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pipelineX](dependency) split different dependencies #27366

Merged
merged 6 commits into from
Nov 22, 2023

Conversation

Gabriel39
Copy link
Contributor

Proposed changes

Issue Number: close #xxx

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

@Gabriel39 Gabriel39 force-pushed the dev_1121 branch 2 times, most recently from cc65e79 to 6b1d7fa Compare November 21, 2023 11:04
@Gabriel39
Copy link
Contributor Author

run buildall

Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

@@ -79,37 +83,46 @@ Status AggLocalState::init(RuntimeState* state, LocalStateInfo& info) {
return Status::OK();
}

Status AggLocalState::_destroy_agg_status(vectorized::AggregateDataPtr data) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: method '_destroy_agg_status' can be made static [readability-convert-member-functions-to-static]

be/src/pipeline/exec/aggregation_source_operator.h:110:

-     Status _destroy_agg_status(vectorized::AggregateDataPtr data);
+     static Status _destroy_agg_status(vectorized::AggregateDataPtr data);

for (size_t i = 0; i < _shared_state->aggregate_evaluators.size(); ++i)
_shared_state->aggregate_evaluators[i]->insert_result_info(
mapped + _dependency->offsets_of_aggregate_states()[i],
for (size_t i = 0; i < shared_state.aggregate_evaluators.size(); ++i)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: statement should be inside braces [readability-braces-around-statements]

Suggested change
for (size_t i = 0; i < shared_state.aggregate_evaluators.size(); ++i)
for (size_t i = 0; i < shared_state.aggregate_evaluators.size(); ++i) {

be/src/pipeline/exec/aggregation_source_operator.cpp:443:

-                                         value_columns[i].get());
+                                         value_columns[i].get());
+ }

@@ -175,6 +175,68 @@ void HashJoinBuildSinkLocalState::init_short_circuit_for_probe() {
p._join_op == TJoinOp::LEFT_ANTI_JOIN);
}

Status HashJoinBuildSinkLocalState::_do_evaluate(vectorized::Block& block,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: method '_do_evaluate' can be made static [readability-convert-member-functions-to-static]

Suggested change
Status HashJoinBuildSinkLocalState::_do_evaluate(vectorized::Block& block,
static Status HashJoinBuildSinkLocalState::_do_evaluate(vectorized::Block& block,

return results;
}

Status HashJoinBuildSinkLocalState::_extract_join_column(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: method '_extract_join_column' can be made static [readability-convert-member-functions-to-static]

Suggested change
Status HashJoinBuildSinkLocalState::_extract_join_column(
static Status HashJoinBuildSinkLocalState::_extract_join_column(

@@ -402,6 +402,48 @@ Status HashJoinProbeOperatorX::pull(doris::RuntimeState* state, vectorized::Bloc
return Status::OK();
}

Status HashJoinProbeLocalState::_extract_join_column(vectorized::Block& block,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: method '_extract_join_column' can be made static [readability-convert-member-functions-to-static]

Suggested change
Status HashJoinProbeLocalState::_extract_join_column(vectorized::Block& block,
static Status HashJoinProbeLocalState::_extract_join_column(vectorized::Block& block,

@@ -93,6 +93,16 @@ class StreamingAggSinkLocalState final
Status _pre_agg_with_serialized_key(doris::vectorized::Block* in_block,
doris::vectorized::Block* out_block);
bool _should_expand_preagg_hash_tables();
void _make_nullable_output_key(vectorized::Block* block) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

warning: method '_make_nullable_output_key' can be made static [readability-convert-member-functions-to-static]

Suggested change
void _make_nullable_output_key(vectorized::Block* block) {
static void _make_nullable_output_key(vectorized::Block* block) {

be/src/pipeline/pipeline_x/dependency.h Show resolved Hide resolved
@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 8eebe6a4eaa1af2522c0f797b5ee02bcf0a276e1, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4926	4635	4692	4635
q2	359	152	164	152
q3	2030	1921	1896	1896
q4	1382	1244	1199	1199
q5	3963	3946	4060	3946
q6	244	132	134	132
q7	1409	872	895	872
q8	2720	2767	2758	2758
q9	9824	9651	9485	9485
q10	3476	3525	3524	3524
q11	374	242	243	242
q12	446	293	290	290
q13	4569	3824	3808	3808
q14	319	290	292	290
q15	586	537	523	523
q16	671	582	583	582
q17	1123	988	948	948
q18	7706	7394	7258	7258
q19	1648	1694	1677	1677
q20	534	328	304	304
q21	4372	3929	3972	3929
q22	481	370	367	367
Total cold run time: 53162 ms
Total hot run time: 48817 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4597	4558	4573	4558
q2	338	228	258	228
q3	3996	4006	3995	3995
q4	2704	2704	2733	2704
q5	9672	9671	9663	9663
q6	243	126	126	126
q7	3026	2475	2460	2460
q8	4420	4414	4401	4401
q9	13200	13085	13096	13085
q10	4097	4187	4186	4186
q11	761	629	626	626
q12	973	812	797	797
q13	4261	3607	3572	3572
q14	391	346	347	346
q15	577	522	521	521
q16	758	686	669	669
q17	3878	3886	3837	3837
q18	9505	8877	9058	8877
q19	1820	1767	1762	1762
q20	2377	2092	2067	2067
q21	8866	8619	8669	8619
q22	879	762	804	762
Total cold run time: 81339 ms
Total hot run time: 77861 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.63 seconds
stream load tsv: 580 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 34 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.1 seconds inserted 10000000 Rows, about 355K ops/s
storage size: 17158197435 Bytes

@Gabriel39
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.56% (8448/23108)
Line Coverage: 28.86% (68674/237977)
Region Coverage: 27.82% (35511/127641)
Branch Coverage: 24.57% (18111/73704)
Coverage Report: http://coverage.selectdb-in.cc/coverage/8eebe6a4eaa1af2522c0f797b5ee02bcf0a276e1_8eebe6a4eaa1af2522c0f797b5ee02bcf0a276e1/report/index.html

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 2fe4b071609424618746b650fcad92335b222230, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4868	4649	4666	4649
q2	355	158	155	155
q3	2094	1998	1907	1907
q4	1383	1273	1234	1234
q5	3955	3963	4031	3963
q6	242	131	131	131
q7	1426	893	900	893
q8	2715	2783	2747	2747
q9	9728	9744	9595	9595
q10	3448	3539	3521	3521
q11	378	250	254	250
q12	435	296	304	296
q13	4587	3828	3868	3828
q14	315	285	280	280
q15	586	539	521	521
q16	667	587	581	581
q17	1125	970	952	952
q18	7811	7307	7347	7307
q19	1672	1669	1686	1669
q20	542	297	297	297
q21	4370	3948	3911	3911
q22	477	367	367	367
Total cold run time: 53179 ms
Total hot run time: 49054 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4605	4582	4563	4563
q2	345	212	272	212
q3	4012	3999	4004	3999
q4	2693	2685	2687	2685
q5	9633	9598	9683	9598
q6	242	121	127	121
q7	3009	2491	2448	2448
q8	4432	4433	4460	4433
q9	13246	13117	13026	13026
q10	4126	4198	4196	4196
q11	789	647	659	647
q12	970	821	829	821
q13	4272	3605	3617	3605
q14	383	343	359	343
q15	580	518	523	518
q16	744	661	689	661
q17	3959	3853	3860	3853
q18	9505	9083	8927	8927
q19	1799	1792	1763	1763
q20	2387	2082	2047	2047
q21	8783	8684	8574	8574
q22	932	768	802	768
Total cold run time: 81446 ms
Total hot run time: 77808 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.78 seconds
stream load tsv: 585 seconds loaded 74807831229 Bytes, about 121 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 34 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 29.1 seconds inserted 10000000 Rows, about 343K ops/s
storage size: 17098446774 Bytes

@Gabriel39
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.55% (8447/23108)
Line Coverage: 28.85% (68678/238035)
Region Coverage: 27.82% (35522/127671)
Branch Coverage: 24.57% (18117/73726)
Coverage Report: http://coverage.selectdb-in.cc/coverage/2fe4b071609424618746b650fcad92335b222230_2fe4b071609424618746b650fcad92335b222230/report/index.html

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.55% (8445/23108)
Line Coverage: 28.84% (68656/238032)
Region Coverage: 27.81% (35501/127670)
Branch Coverage: 24.56% (18106/73726)
Coverage Report: http://coverage.selectdb-in.cc/coverage/edbe7cbbb060e186d11bdbdd5b57a1dc24ce163b_edbe7cbbb060e186d11bdbdd5b57a1dc24ce163b/report/index.html

@Gabriel39
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 47.37 seconds
stream load tsv: 581 seconds loaded 74807831229 Bytes, about 122 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 29.0 seconds inserted 10000000 Rows, about 344K ops/s
storage size: 17099208045 Bytes

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 28d200c9a979a2d5cb7c1e9e0a742e4c870c8c77, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4893	4700	4666	4666
q2	355	183	162	162
q3	2024	1905	1902	1902
q4	1388	1243	1242	1242
q5	3960	3938	3997	3938
q6	249	138	132	132
q7	1397	896	887	887
q8	2733	2774	2744	2744
q9	9720	9581	9541	9541
q10	10244	3534	3515	3515
q11	388	242	254	242
q12	434	294	293	293
q13	4538	3820	3780	3780
q14	336	299	289	289
q15	595	537	529	529
q16	665	584	582	582
q17	1118	981	967	967
q18	7754	7482	7441	7441
q19	1667	1656	1648	1648
q20	538	306	289	289
q21	4349	3925	3948	3925
q22	473	378	383	378
Total cold run time: 59818 ms
Total hot run time: 49092 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4598	4607	4571	4571
q2	340	237	317	237
q3	3995	3974	3970	3970
q4	2706	2697	2691	2691
q5	9582	9570	9557	9557
q6	240	121	128	121
q7	3002	2488	2504	2488
q8	4410	4416	4413	4413
q9	13157	13077	12937	12937
q10	4089	4194	4171	4171
q11	805	661	675	661
q12	970	816	804	804
q13	4283	3555	3543	3543
q14	375	352	368	352
q15	576	516	527	516
q16	732	669	654	654
q17	3900	3818	3852	3818
q18	9536	8933	8993	8933
q19	1786	1776	1754	1754
q20	2387	2108	2057	2057
q21	8676	8661	8612	8612
q22	884	842	762	762
Total cold run time: 81029 ms
Total hot run time: 77622 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.81 seconds
stream load tsv: 577 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.8 seconds inserted 10000000 Rows, about 347K ops/s
storage size: 17100889066 Bytes

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.55% (8445/23107)
Line Coverage: 28.84% (68657/238030)
Region Coverage: 27.81% (35505/127672)
Branch Coverage: 24.56% (18105/73730)
Coverage Report: http://coverage.selectdb-in.cc/coverage/28d200c9a979a2d5cb7c1e9e0a742e4c870c8c77_28d200c9a979a2d5cb7c1e9e0a742e4c870c8c77/report/index.html

@Gabriel39
Copy link
Contributor Author

run buildall

yiguolei
yiguolei previously approved these changes Nov 22, 2023
Copy link
Contributor

@yiguolei yiguolei left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 22, 2023
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

@Gabriel39
Copy link
Contributor Author

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Nov 22, 2023
@Gabriel39
Copy link
Contributor Author

run buildall

1 similar comment
@hello-stephen
Copy link
Contributor

run buildall

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 22, 2023
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 36.56% (8448/23107)
Line Coverage: 28.85% (68671/238032)
Region Coverage: 27.81% (35509/127672)
Branch Coverage: 24.57% (18112/73730)
Coverage Report: http://coverage.selectdb-in.cc/coverage/80faa1e308bc1c93436b6afcef9e1d481fe65ed4_80faa1e308bc1c93436b6afcef9e1d481fe65ed4/report/index.html

Copy link
Contributor

@HappenLee HappenLee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 80faa1e308bc1c93436b6afcef9e1d481fe65ed4, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4874	4641	4634	4634
q2	367	155	157	155
q3	2013	1933	1919	1919
q4	1384	1249	1246	1246
q5	3965	3952	3979	3952
q6	247	126	126	126
q7	1430	872	897	872
q8	2734	2760	2755	2755
q9	9873	9655	9578	9578
q10	3462	3513	3516	3513
q11	383	246	253	246
q12	441	296	297	296
q13	4581	3815	3788	3788
q14	313	297	283	283
q15	587	535	531	531
q16	676	588	587	587
q17	1115	929	896	896
q18	7881	7309	7430	7309
q19	1688	1645	1671	1645
q20	564	323	306	306
q21	4267	3969	3935	3935
q22	478	368	368	368
Total cold run time: 53323 ms
Total hot run time: 48940 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4555	4591	4569	4569
q2	344	226	282	226
q3	3991	3991	3964	3964
q4	2684	2678	2679	2678
q5	9664	9634	9583	9583
q6	240	121	121	121
q7	3028	2471	2445	2445
q8	4458	4430	4442	4430
q9	13181	13079	13087	13079
q10	4129	4184	4221	4184
q11	740	644	663	644
q12	995	803	808	803
q13	4257	3551	3563	3551
q14	394	347	349	347
q15	575	524	532	524
q16	738	693	694	693
q17	3890	3889	3834	3834
q18	9533	9025	9079	9025
q19	1806	1761	1773	1761
q20	2375	2083	2072	2072
q21	8806	8853	8418	8418
q22	926	772	756	756
Total cold run time: 81309 ms
Total hot run time: 77707 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 46.13 seconds
stream load tsv: 583 seconds loaded 74807831229 Bytes, about 122 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17099676227 Bytes

@Gabriel39 Gabriel39 merged commit 5442e8d into apache:master Nov 22, 2023
18 of 19 checks passed
seawinde pushed a commit to seawinde/doris that referenced this pull request Nov 28, 2023
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants