Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Support some compress functions #47307

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

lzyy2024
Copy link

What problem does this PR solve?

Added the compress and uncompressed functions similar to mysql

Issue Number: close #45530

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

Copy link
Contributor

@zclllyybb zclllyybb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and remember to format your file


Status execute_impl(FunctionContext* context, Block& block, const ColumnNumbers& arguments,
uint32_t result, size_t input_rows_count) const override {
// LOG(INFO) << "Executing FunctionCompress with " << input_rows_count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove these commented lines

col_data[idx] = '0', col_data[idx + 1] = 'x';
for (int i = 0; i < 4; i++) {
unsigned char byte = (value >> (i * 8)) & 0xFF;
col_data[idx + 2 + i * 2] = "0123456789ABCDEF"[byte >> 4]; // 高4位
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont use Chinese

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and make magic values


auto st = compression_codec->compress(data, &compressed_str);

if (!st.ok()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a comment about when will it fails

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add cases like regression-test/suites/query_p0/sql_functions/test_template_one_arg.groovy did

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we dont need modify this file anymore

std::string func_name = "compress";
InputTypeSet input_types = {TypeIndex::String};

// 压缩多个不同的字符串
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont use Chinese comment

std::string uncompressed;
Slice data;
Slice uncompressed_slice;
for (int row = 0; row < input_rows_count; row++) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use size_t, not int

illegal = 1;
} else {
if (data[0] != '0' || data[1] != 'x') {
LOG(INFO) << "illegal: "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dont log info here

if (x >= 'A' && x <= 'F') return true;
return false;
};
auto trans = [](char x) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just use from_chars and to_chars to replace your user implemented lambdas

// Print the compressed string (after compression)
// LOG(INFO) << "Compressed string at row " << row << ": "
// << std::string(reinterpret_cast<const char*>(col_data.data()));
col_offset[row] = col_offset[row - 1] + 10 + compressed_str.size() * 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this value for?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first ten digits of the compress value are "0x" and eight digits long, followed by each digit split into two hexadecimal values

@lzyy2024 lzyy2024 requested a review from zclllyybb January 23, 2025 13:05
Copy link
Contributor

@zclllyybb zclllyybb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the correcteness

const auto& str = arg_column.get_data_at(row);
data = Slice(str.data, str.size);

auto st = compression_codec->compress(data, &compressed_str);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when will compress fail?

Slice data;
for (size_t row = 0; row < input_rows_count; row++) {
null_map[row] = false;
const auto& str = arg_column.get_data_at(row);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to use virtual function here

\N \N

-- !const_not_nullable --
0x05000000789C73C92FCA2C060005B00202 0x446F726973
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

carefully review your result!!!

Slice data;
Slice uncompressed_slice;
for (size_t row = 0; row < input_rows_count; row++) {
auto check = [](char x) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to use std function firstly

const auto& str = arg_column.get_data_at(row);
data = Slice(str.data, str.size);

int illegal = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not bool?

unsigned char* src = compressed_str.data();
{
for (size_t i = 0; i < compressed_str.size(); i++) {
col_data[idx] =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so tricky here. try to improve code like it

Slice data;
Slice uncompressed_slice;
for (size_t row = 0; row < input_rows_count; row++) {
std::function<bool(char)> check = [](char x) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use isxdigit?

@@ -854,8 +854,13 @@ class ZlibBlockCompression : public BlockCompressionCodec {
Slice s(*output);

auto zres = ::compress((Bytef*)s.data, &s.size, (Bytef*)input.data, input.size);
if (zres != Z_OK) {
return Status::InvalidArgument("Fail to do ZLib compress, error={}", zError(zres));
if (zres == Z_MEM_ERROR) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also change other same calls

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

split them to another PR may be better

implements UnaryExpression, ExplicitlyCastableSignature, PropagateNullable {

public static final List<FunctionSignature> SIGNATURES = ImmutableList.of(
FunctionSignature.ret(StringType.INSTANCE).args(StringType.INSTANCE));

This comment was marked as resolved.

implements UnaryExpression, ExplicitlyCastableSignature, AlwaysNullable {

public static final List<FunctionSignature> SIGNATURES = ImmutableList.of(
FunctionSignature.ret(StringType.INSTANCE).args(StringType.INSTANCE));

This comment was marked as resolved.


unsigned int length = 0;
for (size_t i = 2; i <= 9; i += 2) {
unsigned char byte = (hex_ctoi.at(data[i]) << 4) + hex_ctoi.at(data[i + 1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove hex_ctoi and just use from_chars

unsigned int length = 0;
for (size_t i = 2; i <= 9; i += 2) {
unsigned char byte = (hex_ctoi.at(data[i]) << 4) + hex_ctoi.at(data[i + 1]);
length += (byte << (8 * (i / 2 - 1))); //Little Endian : 0x01000000 -> 1

This comment was marked as resolved.

std::string uncompressed;
Slice data;
Slice uncompressed_slice;
for (size_t row = 0; row < input_rows_count; row++) {

This comment was marked as resolved.

}
idx += 10;

col_data.resize(col_data.size() + 2 * compressed_str.size());

This comment was marked as resolved.

//Converts a hexadecimal readable string to a compressed byte stream
std::string s(((int)data.size - 10) / 2, ' '); // byte stream data.size >= 10
for (size_t i = 10, j = 0; i < data.size; i += 2, j++) {
s[j] = (hex_ctoi.at(data[i]) << 4) + hex_ctoi.at(data[i + 1]);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@lzyy2024
Copy link
Author

run buildall

@lzyy2024
Copy link
Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32100 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 89c394274803c1107b4fbc5b37d8608ba3af107d, data reload: false

------ Round 1 ----------------------------------
q1	17575	5496	5376	5376
q2	2052	334	182	182
q3	10468	1303	735	735
q4	10229	969	517	517
q5	7663	2384	2167	2167
q6	191	165	136	136
q7	925	764	608	608
q8	9235	1394	1149	1149
q9	5219	4920	4833	4833
q10	6811	2314	1890	1890
q11	476	280	258	258
q12	341	358	214	214
q13	17760	3661	3052	3052
q14	228	244	206	206
q15	512	471	459	459
q16	646	619	598	598
q17	558	860	317	317
q18	7189	6475	6417	6417
q19	1807	966	537	537
q20	304	319	185	185
q21	2804	2173	1957	1957
q22	356	330	307	307
Total cold run time: 103349 ms
Total hot run time: 32100 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5511	5460	5437	5437
q2	248	327	233	233
q3	2242	2638	2307	2307
q4	1439	1838	1365	1365
q5	4323	4725	4681	4681
q6	167	156	124	124
q7	2080	1986	1810	1810
q8	2656	2811	2662	2662
q9	7293	7156	7173	7156
q10	2932	3258	2769	2769
q11	572	516	494	494
q12	717	749	595	595
q13	3494	3934	3293	3293
q14	267	289	267	267
q15	505	474	464	464
q16	658	693	641	641
q17	1207	1731	1256	1256
q18	7613	7379	7400	7379
q19	768	1157	1031	1031
q20	2009	2029	1866	1866
q21	5644	5218	4986	4986
q22	597	652	555	555
Total cold run time: 52942 ms
Total hot run time: 51371 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184954 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 89c394274803c1107b4fbc5b37d8608ba3af107d, data reload: false

query1	962	376	368	368
query2	6516	2097	2071	2071
query3	6802	211	218	211
query4	33731	23198	22991	22991
query5	4407	575	438	438
query6	270	185	173	173
query7	4596	496	311	311
query8	282	244	214	214
query9	9565	2688	2704	2688
query10	472	319	261	261
query11	18192	15054	15022	15022
query12	156	112	103	103
query13	1649	517	409	409
query14	9175	7286	6882	6882
query15	251	188	182	182
query16	8042	641	482	482
query17	1621	744	564	564
query18	2108	401	306	306
query19	229	188	155	155
query20	115	109	110	109
query21	212	123	100	100
query22	4110	4433	4267	4267
query23	33827	33022	32880	32880
query24	6450	2291	2288	2288
query25	529	488	377	377
query26	1198	265	156	156
query27	1997	463	333	333
query28	5369	2458	2448	2448
query29	719	545	418	418
query30	234	181	152	152
query31	934	849	774	774
query32	90	59	67	59
query33	496	354	330	330
query34	734	845	492	492
query35	800	817	741	741
query36	978	1063	968	968
query37	120	104	75	75
query38	4136	4246	4012	4012
query39	1461	1381	1398	1381
query40	221	112	100	100
query41	53	49	55	49
query42	118	97	101	97
query43	511	505	477	477
query44	1332	841	806	806
query45	175	173	163	163
query46	853	1032	637	637
query47	1802	1810	1729	1729
query48	380	404	305	305
query49	783	495	390	390
query50	621	641	390	390
query51	4237	4228	4089	4089
query52	111	103	90	90
query53	223	255	184	184
query54	472	480	413	413
query55	81	82	79	79
query56	258	257	245	245
query57	1158	1141	1066	1066
query58	243	237	245	237
query59	3158	2999	2979	2979
query60	275	266	260	260
query61	119	119	115	115
query62	777	729	637	637
query63	240	199	182	182
query64	4436	995	654	654
query65	3214	3196	3159	3159
query66	1064	407	313	313
query67	15922	15579	15428	15428
query68	4284	817	546	546
query69	466	290	261	261
query70	1212	1103	1118	1103
query71	374	281	250	250
query72	5796	3862	3799	3799
query73	648	747	360	360
query74	10488	8941	8914	8914
query75	3156	3155	2682	2682
query76	3139	1143	755	755
query77	492	339	271	271
query78	9923	10078	9404	9404
query79	2446	797	608	608
query80	788	528	465	465
query81	538	316	244	244
query82	348	151	125	125
query83	170	173	154	154
query84	237	88	77	77
query85	754	360	304	304
query86	440	321	306	306
query87	4424	4481	4498	4481
query88	4173	2165	2210	2165
query89	398	321	302	302
query90	1919	190	188	188
query91	137	139	108	108
query92	70	60	55	55
query93	2644	879	535	535
query94	746	408	294	294
query95	338	262	262	262
query96	484	603	277	277
query97	2775	2886	2750	2750
query98	237	198	191	191
query99	1286	1372	1254	1254
Total cold run time: 281702 ms
Total hot run time: 184954 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.68 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 89c394274803c1107b4fbc5b37d8608ba3af107d, data reload: false

query1	0.03	0.03	0.04
query2	0.08	0.04	0.03
query3	0.23	0.07	0.07
query4	1.61	0.11	0.10
query5	0.42	0.42	0.38
query6	1.15	0.66	0.66
query7	0.02	0.02	0.02
query8	0.04	0.04	0.03
query9	0.58	0.49	0.53
query10	0.55	0.56	0.56
query11	0.14	0.10	0.10
query12	0.13	0.11	0.11
query13	0.60	0.60	0.60
query14	2.85	2.81	2.88
query15	0.89	0.82	0.81
query16	0.39	0.38	0.39
query17	1.05	1.06	1.05
query18	0.22	0.21	0.20
query19	1.90	1.83	2.02
query20	0.02	0.01	0.01
query21	15.36	1.02	0.60
query22	0.75	0.75	0.65
query23	15.37	1.36	0.60
query24	2.91	1.92	0.87
query25	0.16	0.19	0.14
query26	0.23	0.14	0.13
query27	0.06	0.05	0.06
query28	14.17	1.02	0.43
query29	12.60	3.98	3.27
query30	0.26	0.09	0.06
query31	2.82	0.61	0.37
query32	3.24	0.55	0.46
query33	2.99	3.02	3.09
query34	16.54	5.17	4.51
query35	4.52	4.44	4.45
query36	0.65	0.49	0.52
query37	0.10	0.06	0.06
query38	0.05	0.04	0.03
query39	0.03	0.03	0.02
query40	0.17	0.13	0.13
query41	0.08	0.03	0.03
query42	0.04	0.02	0.02
query43	0.04	0.03	0.04
Total cold run time: 106.04 s
Total hot run time: 30.68 s

@lzyy2024
Copy link
Author

run buildall

@lzyy2024
Copy link
Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32971 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 75e8e78802a7295e88f4b0d103064acfcd5e6e4d, data reload: false

------ Round 1 ----------------------------------
q1	17843	6172	5422	5422
q2	2040	300	178	178
q3	10412	1224	728	728
q4	10882	965	536	536
q5	8400	2410	2141	2141
q6	192	176	134	134
q7	906	820	595	595
q8	9228	1339	1150	1150
q9	5785	5158	5032	5032
q10	6988	2361	1956	1956
q11	483	290	268	268
q12	344	370	227	227
q13	18216	3998	3387	3387
q14	272	251	243	243
q15	528	482	477	477
q16	649	626	589	589
q17	569	872	337	337
q18	8233	6545	6461	6461
q19	2878	984	543	543
q20	303	310	192	192
q21	2714	2218	2045	2045
q22	362	332	330	330
Total cold run time: 108227 ms
Total hot run time: 32971 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5710	5517	5469	5469
q2	231	318	233	233
q3	2257	2622	2340	2340
q4	1412	1807	1395	1395
q5	4328	4781	4883	4781
q6	165	162	129	129
q7	2091	1922	1827	1827
q8	2687	2805	2659	2659
q9	7273	7260	7262	7260
q10	3020	3212	2759	2759
q11	586	520	498	498
q12	675	790	601	601
q13	3504	3971	3293	3293
q14	284	306	281	281
q15	519	485	467	467
q16	661	677	630	630
q17	1209	1777	1246	1246
q18	7790	7450	7409	7409
q19	765	1154	1076	1076
q20	2000	2050	1919	1919
q21	5653	5124	5012	5012
q22	631	607	571	571
Total cold run time: 53451 ms
Total hot run time: 51855 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 191631 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 75e8e78802a7295e88f4b0d103064acfcd5e6e4d, data reload: false

query1	1306	964	932	932
query2	6184	2038	2033	2033
query3	11103	4702	4399	4399
query4	61069	29129	23025	23025
query5	5535	611	458	458
query6	432	204	183	183
query7	5529	511	307	307
query8	331	247	233	233
query9	8032	2708	2701	2701
query10	469	305	259	259
query11	17709	15224	15513	15224
query12	168	122	114	114
query13	1465	546	409	409
query14	11082	7040	6994	6994
query15	210	206	197	197
query16	7241	636	484	484
query17	1201	730	591	591
query18	1910	422	335	335
query19	205	194	165	165
query20	123	114	118	114
query21	225	131	106	106
query22	4449	4470	4543	4470
query23	34433	33834	33260	33260
query24	5996	2367	2297	2297
query25	460	467	404	404
query26	649	279	157	157
query27	1809	459	333	333
query28	4055	2489	2456	2456
query29	525	545	431	431
query30	214	192	158	158
query31	929	915	837	837
query32	64	60	57	57
query33	438	366	306	306
query34	742	872	503	503
query35	816	867	758	758
query36	1033	1051	950	950
query37	115	107	78	78
query38	4310	4362	4265	4265
query39	1508	1448	1442	1442
query40	217	113	103	103
query41	51	51	50	50
query42	124	109	102	102
query43	507	516	494	494
query44	1338	846	857	846
query45	183	173	171	171
query46	873	1054	654	654
query47	1891	1979	1874	1874
query48	396	407	342	342
query49	718	493	409	409
query50	649	707	400	400
query51	4265	4313	4172	4172
query52	111	105	99	99
query53	228	254	200	200
query54	485	513	426	426
query55	81	82	82	82
query56	260	266	244	244
query57	1237	1210	1154	1154
query58	233	231	236	231
query59	3223	3364	3053	3053
query60	279	271	266	266
query61	139	112	117	112
query62	736	720	663	663
query63	225	184	185	184
query64	1286	1034	656	656
query65	3273	3124	3142	3124
query66	689	435	332	332
query67	16065	15658	15451	15451
query68	5022	809	539	539
query69	475	295	264	264
query70	1178	1161	1126	1126
query71	416	286	253	253
query72	6050	3899	3797	3797
query73	803	764	353	353
query74	9860	8792	8698	8698
query75	3220	3156	2703	2703
query76	3796	1195	748	748
query77	536	353	275	275
query78	10087	10047	9345	9345
query79	2453	805	603	603
query80	1199	524	485	485
query81	540	279	227	227
query82	355	160	126	126
query83	242	165	159	159
query84	291	92	70	70
query85	746	342	301	301
query86	377	321	301	301
query87	4546	4476	4482	4476
query88	3486	2174	2136	2136
query89	393	332	292	292
query90	1576	182	188	182
query91	135	133	110	110
query92	61	56	55	55
query93	2163	848	534	534
query94	737	402	294	294
query95	321	260	265	260
query96	489	618	279	279
query97	2815	2894	2811	2811
query98	224	192	193	192
query99	1302	1402	1318	1318
Total cold run time: 309730 ms
Total hot run time: 191631 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.67 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 75e8e78802a7295e88f4b0d103064acfcd5e6e4d, data reload: false

query1	0.04	0.04	0.03
query2	0.07	0.03	0.03
query3	0.25	0.06	0.07
query4	1.61	0.11	0.10
query5	0.42	0.42	0.40
query6	1.17	0.65	0.66
query7	0.02	0.02	0.02
query8	0.04	0.04	0.03
query9	0.58	0.49	0.50
query10	0.56	0.56	0.54
query11	0.15	0.10	0.10
query12	0.14	0.11	0.11
query13	0.60	0.60	0.61
query14	2.85	2.74	2.72
query15	0.90	0.83	0.82
query16	0.39	0.38	0.36
query17	1.05	1.01	1.00
query18	0.24	0.20	0.20
query19	1.86	1.88	2.01
query20	0.01	0.01	0.01
query21	15.36	0.99	0.58
query22	0.77	0.82	0.75
query23	15.21	1.49	0.53
query24	3.25	0.92	0.84
query25	0.17	0.26	0.12
query26	0.18	0.15	0.14
query27	0.05	0.04	0.04
query28	13.60	1.09	0.44
query29	12.60	3.98	3.33
query30	0.26	0.08	0.06
query31	2.84	0.62	0.39
query32	3.23	0.54	0.46
query33	2.97	3.06	2.99
query34	16.60	5.17	4.52
query35	4.61	4.63	4.54
query36	0.65	0.48	0.50
query37	0.09	0.06	0.05
query38	0.05	0.04	0.04
query39	0.03	0.02	0.03
query40	0.16	0.13	0.14
query41	0.08	0.03	0.03
query42	0.04	0.03	0.02
query43	0.04	0.03	0.02
Total cold run time: 105.79 s
Total hot run time: 30.67 s

@lzyy2024
Copy link
Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32328 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 13ebe672a083689491c631868d403d84b840cd3f, data reload: false

------ Round 1 ----------------------------------
q1	17587	5520	5400	5400
q2	2046	311	168	168
q3	10541	1284	722	722
q4	10240	962	540	540
q5	8273	2482	2182	2182
q6	195	165	135	135
q7	904	774	641	641
q8	9245	1366	1178	1178
q9	5286	4870	4929	4870
q10	6871	2353	1879	1879
q11	456	280	259	259
q12	352	358	216	216
q13	17765	3713	3109	3109
q14	232	240	206	206
q15	536	483	459	459
q16	634	616	600	600
q17	567	876	320	320
q18	7111	6386	6397	6386
q19	1677	953	548	548
q20	312	323	190	190
q21	2862	2253	2005	2005
q22	364	331	315	315
Total cold run time: 104056 ms
Total hot run time: 32328 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5710	5507	5510	5507
q2	237	324	254	254
q3	2251	2600	2267	2267
q4	1411	1801	1361	1361
q5	4357	4730	4650	4650
q6	173	163	129	129
q7	2075	1965	1892	1892
q8	2583	2829	2689	2689
q9	7428	7143	7226	7143
q10	3027	3345	2815	2815
q11	592	521	496	496
q12	672	778	609	609
q13	3498	3913	3403	3403
q14	290	305	283	283
q15	524	479	464	464
q16	631	694	636	636
q17	1240	1724	1262	1262
q18	7669	7556	7362	7362
q19	761	1067	1128	1067
q20	1975	2072	1898	1898
q21	5701	5295	5131	5131
q22	592	572	577	572
Total cold run time: 53397 ms
Total hot run time: 51890 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 185720 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 13ebe672a083689491c631868d403d84b840cd3f, data reload: false

query1	979	388	367	367
query2	6520	2069	2002	2002
query3	6799	218	219	218
query4	33222	23366	23107	23107
query5	4314	612	458	458
query6	285	210	187	187
query7	4590	486	314	314
query8	302	245	229	229
query9	9612	2703	2704	2703
query10	466	306	249	249
query11	17939	15268	15179	15179
query12	157	103	102	102
query13	1670	539	391	391
query14	10394	7012	6984	6984
query15	230	190	186	186
query16	7215	619	477	477
query17	1595	701	574	574
query18	1740	393	312	312
query19	237	190	171	171
query20	122	117	111	111
query21	213	123	103	103
query22	4125	4424	4309	4309
query23	34421	33045	33169	33045
query24	6612	2295	2392	2295
query25	506	475	398	398
query26	1221	277	158	158
query27	1968	461	347	347
query28	5186	2468	2451	2451
query29	607	571	453	453
query30	232	186	168	168
query31	964	891	814	814
query32	73	64	62	62
query33	524	419	307	307
query34	741	838	519	519
query35	794	804	762	762
query36	1022	1063	941	941
query37	121	105	80	80
query38	4089	4163	4004	4004
query39	1492	1380	1454	1380
query40	205	111	103	103
query41	55	52	63	52
query42	120	103	109	103
query43	517	506	484	484
query44	1373	814	816	814
query45	177	170	163	163
query46	859	1031	647	647
query47	1779	1835	1791	1791
query48	388	401	327	327
query49	758	478	397	397
query50	633	670	392	392
query51	4188	4212	4141	4141
query52	101	106	93	93
query53	236	251	196	196
query54	488	496	404	404
query55	83	77	79	77
query56	263	267	242	242
query57	1151	1168	1073	1073
query58	241	227	246	227
query59	3010	2995	2755	2755
query60	277	265	251	251
query61	117	109	113	109
query62	792	720	664	664
query63	217	192	194	192
query64	4076	1017	637	637
query65	3245	3205	3143	3143
query66	906	414	311	311
query67	15870	15809	15640	15640
query68	5346	836	541	541
query69	443	289	253	253
query70	1195	1164	1083	1083
query71	387	282	260	260
query72	5798	3822	3776	3776
query73	655	760	363	363
query74	9923	8945	9249	8945
query75	3187	3129	2656	2656
query76	3199	1183	785	785
query77	481	367	283	283
query78	10003	10019	9345	9345
query79	3024	829	604	604
query80	682	529	446	446
query81	498	277	282	277
query82	423	155	124	124
query83	165	174	153	153
query84	239	89	76	76
query85	787	337	301	301
query86	390	323	305	305
query87	4520	4423	4439	4423
query88	5058	2177	2157	2157
query89	386	327	294	294
query90	1809	192	206	192
query91	131	133	105	105
query92	62	59	56	56
query93	2315	876	541	541
query94	664	421	308	308
query95	334	278	262	262
query96	491	619	290	290
query97	2761	2875	2725	2725
query98	229	205	195	195
query99	1287	1394	1251	1251
Total cold run time: 282296 ms
Total hot run time: 185720 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 30.3 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 13ebe672a083689491c631868d403d84b840cd3f, data reload: false

query1	0.03	0.03	0.04
query2	0.06	0.04	0.03
query3	0.24	0.06	0.07
query4	1.61	0.11	0.10
query5	0.43	0.44	0.41
query6	1.16	0.65	0.65
query7	0.02	0.01	0.01
query8	0.04	0.03	0.04
query9	0.59	0.49	0.51
query10	0.55	0.58	0.56
query11	0.14	0.11	0.11
query12	0.13	0.10	0.11
query13	0.63	0.60	0.60
query14	2.73	2.89	2.85
query15	0.89	0.84	0.82
query16	0.39	0.39	0.38
query17	1.01	1.03	1.00
query18	0.22	0.20	0.20
query19	1.86	1.77	2.09
query20	0.02	0.01	0.01
query21	15.37	0.94	0.56
query22	0.75	0.86	0.77
query23	15.15	1.48	0.60
query24	2.94	1.71	0.36
query25	0.28	0.09	0.13
query26	0.34	0.14	0.13
query27	0.05	0.07	0.06
query28	13.57	1.03	0.44
query29	12.58	3.96	3.27
query30	0.25	0.09	0.07
query31	2.82	0.60	0.37
query32	3.25	0.55	0.46
query33	3.01	3.01	3.05
query34	16.61	5.27	4.56
query35	4.51	4.54	4.52
query36	0.65	0.49	0.48
query37	0.10	0.06	0.06
query38	0.04	0.04	0.04
query39	0.03	0.02	0.03
query40	0.17	0.14	0.14
query41	0.09	0.03	0.03
query42	0.04	0.02	0.03
query43	0.04	0.04	0.03
Total cold run time: 105.39 s
Total hot run time: 30.3 s

@lzyy2024
Copy link
Author

run buildall

@lzyy2024
Copy link
Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32060 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 9263dea5e49d60ca40fe41a6ec858405ae8202f9, data reload: false

------ Round 1 ----------------------------------
q1	17599	5513	5327	5327
q2	2048	315	176	176
q3	10559	1307	702	702
q4	10239	974	529	529
q5	8071	2343	2119	2119
q6	197	165	132	132
q7	893	758	619	619
q8	9225	1340	1185	1185
q9	5100	4824	4851	4824
q10	6800	2333	1885	1885
q11	458	275	267	267
q12	347	356	218	218
q13	17763	3681	3140	3140
q14	221	225	206	206
q15	506	462	465	462
q16	635	609	595	595
q17	549	850	325	325
q18	7160	6336	6356	6336
q19	2894	975	532	532
q20	304	322	190	190
q21	2803	2216	1979	1979
q22	372	334	312	312
Total cold run time: 104743 ms
Total hot run time: 32060 ms

----- Round 2, with runtime_filter_mode=off -----
q1	5588	5452	5489	5452
q2	239	330	235	235
q3	2275	2687	2332	2332
q4	1444	1828	1348	1348
q5	4342	4696	4660	4660
q6	180	164	129	129
q7	2107	1957	1883	1883
q8	2656	2824	2672	2672
q9	7277	7174	7171	7171
q10	2963	3246	2769	2769
q11	583	532	494	494
q12	690	779	682	682
q13	3493	3895	3326	3326
q14	283	293	274	274
q15	518	473	459	459
q16	640	669	620	620
q17	1205	1713	1266	1266
q18	7576	7316	7326	7316
q19	782	1131	1033	1033
q20	2051	2035	1871	1871
q21	5615	5290	5118	5118
q22	628	610	600	600
Total cold run time: 53135 ms
Total hot run time: 51710 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 184635 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 9263dea5e49d60ca40fe41a6ec858405ae8202f9, data reload: false

query1	972	376	371	371
query2	6509	2041	1988	1988
query3	6789	219	218	218
query4	36458	23342	22916	22916
query5	4386	602	450	450
query6	290	198	188	188
query7	4604	484	306	306
query8	298	246	228	228
query9	9471	2699	2692	2692
query10	465	306	254	254
query11	17964	15196	14913	14913
query12	160	109	102	102
query13	1642	523	385	385
query14	9587	6895	7236	6895
query15	234	191	183	183
query16	7785	609	471	471
query17	1575	702	540	540
query18	2009	393	306	306
query19	233	191	159	159
query20	119	119	118	118
query21	207	123	104	104
query22	4130	4427	4092	4092
query23	34663	32920	32906	32906
query24	6636	2291	2267	2267
query25	481	480	422	422
query26	1040	279	158	158
query27	1985	472	345	345
query28	5057	2492	2452	2452
query29	635	595	448	448
query30	235	190	156	156
query31	979	855	811	811
query32	70	62	62	62
query33	530	374	313	313
query34	750	845	497	497
query35	848	872	749	749
query36	964	1009	955	955
query37	126	97	89	89
query38	4123	4125	4075	4075
query39	1425	1389	1379	1379
query40	203	121	104	104
query41	52	55	50	50
query42	119	102	107	102
query43	504	518	476	476
query44	1296	797	803	797
query45	181	168	166	166
query46	862	1027	651	651
query47	1818	1866	1778	1778
query48	393	410	314	314
query49	778	494	390	390
query50	666	654	409	409
query51	4171	4153	4129	4129
query52	105	104	90	90
query53	223	252	185	185
query54	486	494	405	405
query55	82	80	85	80
query56	255	270	235	235
query57	1157	1163	1091	1091
query58	254	235	246	235
query59	3153	3217	2880	2880
query60	276	280	260	260
query61	117	120	121	120
query62	832	726	644	644
query63	225	191	210	191
query64	3589	1020	677	677
query65	3227	3155	3161	3155
query66	934	416	315	315
query67	15861	15984	15412	15412
query68	4284	842	528	528
query69	464	291	259	259
query70	1208	1135	1149	1135
query71	372	280	258	258
query72	5864	3841	3977	3841
query73	654	750	371	371
query74	10124	8927	8995	8927
query75	3185	3149	2621	2621
query76	3217	1160	775	775
query77	470	370	274	274
query78	10087	10007	9497	9497
query79	3152	814	585	585
query80	1499	530	438	438
query81	568	281	240	240
query82	715	152	129	129
query83	183	170	155	155
query84	238	100	73	73
query85	800	388	300	300
query86	427	300	297	297
query87	4543	4480	4275	4275
query88	5047	2183	2163	2163
query89	402	331	297	297
query90	1786	191	186	186
query91	138	135	111	111
query92	66	58	54	54
query93	2742	895	537	537
query94	740	409	297	297
query95	330	261	260	260
query96	488	612	279	279
query97	2758	2849	2715	2715
query98	238	205	196	196
query99	1327	1378	1258	1258
Total cold run time: 286369 ms
Total hot run time: 184635 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.04 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 9263dea5e49d60ca40fe41a6ec858405ae8202f9, data reload: false

query1	0.06	0.03	0.03
query2	0.07	0.03	0.03
query3	0.24	0.07	0.07
query4	1.61	0.10	0.10
query5	0.41	0.42	0.39
query6	1.16	0.66	0.67
query7	0.02	0.02	0.01
query8	0.04	0.03	0.03
query9	0.58	0.51	0.52
query10	0.54	0.57	0.55
query11	0.14	0.11	0.10
query12	0.14	0.11	0.11
query13	0.61	0.61	0.60
query14	2.84	2.75	2.81
query15	0.89	0.83	0.82
query16	0.36	0.37	0.37
query17	1.05	1.01	1.00
query18	0.22	0.21	0.22
query19	1.99	2.05	1.88
query20	0.01	0.01	0.01
query21	15.36	0.94	0.59
query22	0.74	0.90	0.62
query23	15.20	1.49	0.59
query24	3.29	1.33	1.69
query25	0.15	0.17	0.10
query26	0.29	0.15	0.14
query27	0.06	0.05	0.04
query28	14.06	1.06	0.43
query29	12.53	4.07	3.24
query30	0.25	0.08	0.06
query31	2.83	0.62	0.39
query32	3.23	0.55	0.46
query33	2.95	3.03	3.03
query34	16.53	5.26	4.52
query35	4.46	4.48	4.50
query36	0.65	0.51	0.47
query37	0.10	0.06	0.06
query38	0.05	0.04	0.04
query39	0.03	0.02	0.02
query40	0.16	0.13	0.13
query41	0.08	0.02	0.03
query42	0.03	0.03	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.05 s
Total hot run time: 31.04 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 42.08% (10979/26093)
Line Coverage: 32.35% (92831/286929)
Region Coverage: 31.50% (47590/151083)
Branch Coverage: 27.54% (24105/87524)
Coverage Report: http://coverage.selectdb-in.cc/coverage/9263dea5e49d60ca40fe41a6ec858405ae8202f9_9263dea5e49d60ca40fe41a6ec858405ae8202f9/report/index.html

}

// first ten digits represent the length of the uncompressed string
col_data.resize(col_data.size() + 10);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L111 and L120's resize could be merged.

if (data[0] != '0' || data[1] != 'x') {
illegal = true;
}
for (size_t i = 2; i <= 9; i += 2) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why +=2 here? should be ++?

}

{
std::string func_name = "uncompress";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add some case of uncompressing a string which is making from a valid compressed string with minor modifications.
like, for valid '0x1204', try to uncompress '0x12F4' which in invalid to get the NULL result.

}

uncompressed.resize(length);
uncompressed_slice = Slice(uncompressed);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove the uncompressed, just resize col_data to add its length with length. so just point uncompressed_slice into col_data. then we can do decompress inplace and no need to do heavily memcpy(at L237)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enhancement](good-first-issue) Support some compress functions
4 participants