Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feat](spill) spill and reserve #46230

Closed
wants to merge 10 commits into from
Closed

Conversation

mrhhsg
Copy link
Member

@mrhhsg mrhhsg commented Dec 31, 2024

Squashed commit: 8c9a7c9 ~ 0e7e42d

What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mrhhsg mrhhsg marked this pull request as draft December 31, 2024 15:16
@mrhhsg
Copy link
Member Author

mrhhsg commented Dec 31, 2024

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 33859 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 832bfbd0ee30629148e380ecdc7056b6dab8d4b9, data reload: false

------ Round 1 ----------------------------------
q1	17708	6220	6111	6111
q2	2083	322	170	170
q3	10507	1259	752	752
q4	10231	879	445	445
q5	7629	2197	1991	1991
q6	208	183	144	144
q7	907	752	599	599
q8	9231	1370	1187	1187
q9	5507	5174	5226	5174
q10	6737	2306	1872	1872
q11	476	278	257	257
q12	350	357	221	221
q13	17762	3866	3100	3100
q14	239	229	207	207
q15	559	507	495	495
q16	636	623	589	589
q17	577	873	341	341
q18	8003	7223	7275	7223
q19	1229	990	526	526
q20	292	307	189	189
q21	2839	2211	1962	1962
q22	365	344	304	304
Total cold run time: 104075 ms
Total hot run time: 33859 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6204	6241	6274	6241
q2	234	327	230	230
q3	2853	3167	2723	2723
q4	1401	1854	1368	1368
q5	6206	6356	6443	6356
q6	218	193	144	144
q7	2187	2024	1867	1867
q8	3790	3738	3676	3676
q9	9652	9595	9461	9461
q10	3252	3640	3138	3138
q11	608	519	505	505
q12	660	769	621	621
q13	3607	3932	3328	3328
q14	286	316	277	277
q15	577	521	501	501
q16	649	679	650	650
q17	1224	1785	1256	1256
q18	9313	8712	8836	8712
q19	777	1192	1043	1043
q20	2054	2020	1949	1949
q21	6969	5972	5681	5681
q22	613	589	583	583
Total cold run time: 63334 ms
Total hot run time: 60310 ms

@mrhhsg mrhhsg force-pushed the spill_and_reserve branch 3 times, most recently from bf480c5 to 0e64be6 Compare January 1, 2025 15:29
@mrhhsg
Copy link
Member Author

mrhhsg commented Jan 1, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34107 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 0e64be65987bc078d07353a403587daf6071eab8, data reload: false

------ Round 1 ----------------------------------
q1	17685	6170	6050	6050
q2	2048	318	177	177
q3	10493	1224	739	739
q4	10199	862	436	436
q5	7628	2235	2026	2026
q6	215	187	154	154
q7	919	769	620	620
q8	9247	1395	1209	1209
q9	5464	5226	5208	5208
q10	6749	2308	1936	1936
q11	476	276	260	260
q12	351	361	217	217
q13	17779	3762	3147	3147
q14	250	238	212	212
q15	570	509	514	509
q16	630	629	603	603
q17	592	862	348	348
q18	8775	7207	7271	7207
q19	1231	964	541	541
q20	299	310	209	209
q21	2775	2153	1996	1996
q22	363	338	303	303
Total cold run time: 104738 ms
Total hot run time: 34107 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6262	6244	6238	6238
q2	229	327	227	227
q3	2932	3219	2760	2760
q4	1409	1909	1388	1388
q5	6229	6375	6431	6375
q6	215	191	147	147
q7	2215	2061	1910	1910
q8	3762	3772	3797	3772
q9	9706	9686	9490	9490
q10	3416	3464	3080	3080
q11	587	516	516	516
q12	682	748	608	608
q13	3609	3911	3327	3327
q14	283	312	296	296
q15	543	513	511	511
q16	673	689	648	648
q17	1257	1750	1267	1267
q18	9248	8760	8652	8652
q19	801	1077	1135	1077
q20	2043	2058	1887	1887
q21	6702	6108	5983	5983
q22	613	601	577	577
Total cold run time: 63416 ms
Total hot run time: 60736 ms

@mrhhsg mrhhsg force-pushed the spill_and_reserve branch 3 times, most recently from 16dad3e to bf2a73b Compare January 2, 2025 10:19
@mrhhsg
Copy link
Member Author

mrhhsg commented Jan 2, 2025

run buildall

@mrhhsg mrhhsg force-pushed the spill_and_reserve branch 3 times, most recently from a8e5c79 to 393f12e Compare January 3, 2025 09:02
@mrhhsg
Copy link
Member Author

mrhhsg commented Jan 3, 2025

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 34093 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 393f12ea6ad950b270425928c4ce386f2d2623eb, data reload: false

------ Round 1 ----------------------------------
q1	17735	6236	6163	6163
q2	2042	294	171	171
q3	10606	1306	716	716
q4	10230	879	433	433
q5	8306	2256	1993	1993
q6	221	182	146	146
q7	893	741	621	621
q8	9242	1385	1196	1196
q9	5501	5207	5201	5201
q10	6741	2335	1878	1878
q11	484	274	256	256
q12	355	360	220	220
q13	17795	3950	3159	3159
q14	243	230	227	227
q15	562	516	511	511
q16	628	606	602	602
q17	578	857	332	332
q18	8226	7317	7248	7248
q19	1318	960	549	549
q20	299	308	184	184
q21	2914	2138	1974	1974
q22	370	330	313	313
Total cold run time: 105289 ms
Total hot run time: 34093 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6290	6294	6364	6294
q2	236	328	229	229
q3	2727	3138	2840	2840
q4	1421	1899	1439	1439
q5	6236	6491	6423	6423
q6	214	190	143	143
q7	2363	2001	1841	1841
q8	3711	3678	3622	3622
q9	9582	9458	9525	9458
q10	3417	3529	3104	3104
q11	592	511	483	483
q12	683	757	637	637
q13	3612	4071	3324	3324
q14	302	302	275	275
q15	565	511	488	488
q16	658	695	644	644
q17	1209	1795	1269	1269
q18	9218	8406	8309	8309
q19	750	1095	1037	1037
q20	1964	1996	1854	1854
q21	6908	5690	5849	5690
q22	600	627	569	569
Total cold run time: 63258 ms
Total hot run time: 59972 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191904 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 393f12ea6ad950b270425928c4ce386f2d2623eb, data reload: false

query1	986	382	377	377
query2	6524	2332	2390	2332
query3	6701	215	210	210
query4	30324	24069	23858	23858
query5	4372	638	454	454
query6	281	190	202	190
query7	4636	497	310	310
query8	322	245	252	245
query9	9620	2686	2672	2672
query10	487	319	257	257
query11	17670	15578	15229	15229
query12	164	106	104	104
query13	1667	541	401	401
query14	11193	7922	8157	7922
query15	221	186	197	186
query16	7933	634	480	480
query17	1598	750	574	574
query18	2100	436	300	300
query19	216	182	161	161
query20	122	119	113	113
query21	213	159	99	99
query22	4594	4510	4106	4106
query23	35775	34612	34113	34113
query24	6909	2246	2248	2246
query25	476	441	381	381
query26	1212	270	148	148
query27	2096	475	344	344
query28	5385	2416	2383	2383
query29	686	534	405	405
query30	231	182	148	148
query31	981	895	830	830
query32	71	62	57	57
query33	523	350	280	280
query34	712	839	501	501
query35	788	818	741	741
query36	993	1034	944	944
query37	117	102	75	75
query38	4142	4257	4147	4147
query39	1486	1427	1443	1427
query40	204	123	103	103
query41	49	45	46	45
query42	128	105	104	104
query43	504	512	490	490
query44	1298	800	802	800
query45	178	177	167	167
query46	857	1037	650	650
query47	1896	1923	1845	1845
query48	398	412	311	311
query49	771	471	398	398
query50	619	663	387	387
query51	7032	6987	7556	6987
query52	103	102	91	91
query53	216	254	185	185
query54	471	482	407	407
query55	80	78	81	78
query56	253	269	242	242
query57	1204	1165	1139	1139
query58	229	227	235	227
query59	3156	3176	3130	3130
query60	261	259	257	257
query61	111	105	111	105
query62	895	795	739	739
query63	217	183	198	183
query64	4197	984	634	634
query65	3381	3272	3302	3272
query66	1067	406	318	318
query67	16067	15856	15494	15494
query68	8918	737	504	504
query69	462	294	253	253
query70	1208	1133	1028	1028
query71	424	293	248	248
query72	5749	3625	3466	3466
query73	1166	727	349	349
query74	10078	9341	9006	9006
query75	4077	3111	2627	2627
query76	5083	1270	751	751
query77	941	351	272	272
query78	10543	10434	9694	9694
query79	2112	812	599	599
query80	731	510	458	458
query81	460	277	239	239
query82	474	141	126	126
query83	196	174	145	145
query84	278	89	69	69
query85	747	361	307	307
query86	340	324	305	305
query87	4752	4611	4295	4295
query88	3508	2196	2146	2146
query89	410	344	293	293
query90	2076	187	185	185
query91	133	139	106	106
query92	64	55	56	55
query93	917	907	525	525
query94	665	396	298	298
query95	338	257	251	251
query96	489	600	281	281
query97	2982	3047	2852	2852
query98	220	204	191	191
query99	1642	1557	1442	1442
Total cold run time: 292552 ms
Total hot run time: 191904 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.36 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 393f12ea6ad950b270425928c4ce386f2d2623eb, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.04
query3	0.24	0.07	0.07
query4	1.62	0.11	0.11
query5	0.44	0.40	0.39
query6	1.15	0.65	0.65
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.58	0.51	0.52
query10	0.55	0.57	0.55
query11	0.14	0.11	0.11
query12	0.13	0.11	0.11
query13	0.61	0.61	0.59
query14	2.85	2.77	2.90
query15	0.90	0.83	0.82
query16	0.38	0.37	0.38
query17	1.02	1.03	1.06
query18	0.22	0.20	0.20
query19	1.88	1.81	2.00
query20	0.01	0.01	0.01
query21	15.36	0.93	0.59
query22	0.74	0.87	0.67
query23	15.27	1.47	0.60
query24	3.01	0.87	1.62
query25	0.18	0.11	0.07
query26	0.38	0.14	0.14
query27	0.07	0.05	0.04
query28	13.90	1.53	1.04
query29	12.57	3.91	3.26
query30	0.25	0.09	0.07
query31	2.80	0.58	0.38
query32	3.25	0.54	0.47
query33	3.05	3.11	3.19
query34	16.69	5.14	4.56
query35	4.52	4.50	4.51
query36	0.62	0.48	0.49
query37	0.09	0.06	0.06
query38	0.04	0.03	0.04
query39	0.04	0.03	0.02
query40	0.16	0.14	0.13
query41	0.08	0.03	0.02
query42	0.04	0.02	0.03
query43	0.04	0.03	0.02
Total cold run time: 106.03 s
Total hot run time: 31.36 s

@mrhhsg mrhhsg force-pushed the spill_and_reserve branch from 393f12e to 98c3ad8 Compare January 3, 2025 15:17
mrhhsg and others added 4 commits January 6, 2025 11:10
### What problem does this PR solve?

1. fix log4j format %% error.
2. change wg's low water mark to 75% and high watermark to 85% to make
the spill disk more stable.
3. change exec_memlimit as hard limit if user set it.
@mrhhsg mrhhsg force-pushed the spill_and_reserve branch 2 times, most recently from 5f902f0 to 29c8b1a Compare January 10, 2025 06:21
…obeSinkOperatorX (apache#46706)

### What problem does this PR solve?

```
*** Query id: 80819fcc223e4a45-b46155de6e0c4eee ***
*** is nereids: 1 ***
*** tablet id: 0 ***
*** Aborted at 1736352810 (unix time) try "date -d @1736352810" if you are using GNU date ***
*** Current BE git commitID: 08683cb ***
*** SIGSEGV address not mapped to object (@0x38) received by PID 8736 (TID 11549 OR 0x7f8dd0922640) from PID 56; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/doris_branch-3.0/doris/be/src/common/signal_handler.h:421
 1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F92019CA520 in /lib/x86_64-linux-gnu/libc.so.6
 4# auto doris::pipeline::SetProbeSinkOperatorX::_refresh_hash_table(doris::pipeline::SetProbeSinkLocalState&)::{lambda(auto:1&&)apache#1}::operator(), HashTableNoState>, DefaultHash, HashTableGrower<10ul>, Allocator > >&>(doris::vectorized::MethodSerialized, HashTableNoState>, DefaultHash, HashTableGrower<10ul>, Allocator > >&) const at /root/doris_branch-3.0/doris/be/src/pipeline/exec/set_probe_sink_operator.cpp:213
 5# doris::pipeline::SetProbeSinkOperatorX::_finalize_probe(doris::pipeline::SetProbeSinkLocalState&) at /root/doris_branch-3.0/doris/be/src/pipeline/exec/set_probe_sink_operator.cpp:184
 6# doris::pipeline::SetProbeSinkOperatorX::sink(doris::RuntimeState*, doris::vectorized::Block*, bool) at /root/doris_branch-3.0/doris/be/src/pipeline/exec/set_probe_sink_operator.cpp:98
 7# doris::pipeline::PipelineTask::execute(bool*) at /root/doris_branch-3.0/doris/be/src/pipeline/pipeline_task.cpp:387
 8# doris::pipeline::TaskScheduler::_do_work(unsigned long) at /root/doris_branch-3.0/doris/be/src/pipeline/task_scheduler.cpp:138
 9# doris::ThreadPool::dispatch_thread() in /mnt/ssd01/doris-branch40preview/NEREIDS_ASAN/be/lib/doris_be
10# doris::Thread::supervise_thread(void*) at /root/doris_branch-3.0/doris/be/src/util/thread.cpp:499
11# start_thread at ./nptl/pthread_create.c:442
12# 0x00007F9201AAE850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
```
@mrhhsg mrhhsg force-pushed the spill_and_reserve branch from 29c8b1a to ac08f9f Compare January 12, 2025 09:32
jacktengg and others added 4 commits January 13, 2025 18:16
…#46570)

### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:
When there are a lot of segments in one rowset, it will consume plenty
of memory if open all the segments all at once. This PR open segments
one by one and release the `Segment` object immediately if it's not need
to be kept for later use, thus reduce memory footprints dramatically.
@mrhhsg mrhhsg closed this Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants