Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hotfix](jdbc catalog) fix realColumnNames serialize npe #27280

Merged
merged 1 commit into from
Nov 22, 2023

Conversation

zy-kkk
Copy link
Member

@zy-kkk zy-kkk commented Nov 20, 2023

Proposed changes

Issue Number: close #xxx

In the previous PR #27124, we used objectMapper.readValue for deserialization. However, this method does not handle null fields, which can lead to issues when upgrading from older versions. Specifically, if a required field is missing in the persistent data, String realColumnNamesJson = serializeMap.get(REAL_COLUMNS); will return null, resulting in deserialization errors and frontend startup failure. This issue is likely to occur when upgrading from an older version that uses Jdbc Catalog to a new version including PR #27124. As this represents a specific upgrade scenario involving compatibility with old version data structures, it was not covered in the regular PR test cases. Given the specificity and difficulty in replicating such a scenario, no special test cases were added for this PR.

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

@zy-kkk
Copy link
Member Author

zy-kkk commented Nov 20, 2023

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 370ee325326a2c5863c3a593376750992c42877d, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4888	4682	4634	4634
q2	362	160	159	159
q3	2036	1931	1928	1928
q4	1386	1321	1251	1251
q5	3975	3962	4015	3962
q6	246	131	130	130
q7	1423	878	907	878
q8	2768	2809	2772	2772
q9	9697	9630	9582	9582
q10	3493	3547	3547	3547
q11	385	257	248	248
q12	434	300	294	294
q13	4559	3821	3835	3821
q14	318	289	287	287
q15	586	541	511	511
q16	659	582	580	580
q17	1139	963	943	943
q18	7811	7429	7281	7281
q19	1681	1679	1692	1679
q20	567	307	293	293
q21	4386	3938	3924	3924
q22	477	378	373	373
Total cold run time: 53276 ms
Total hot run time: 49077 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4626	4599	4564	4564
q2	345	218	260	218
q3	3982	3975	3978	3975
q4	2698	2671	2674	2671
q5	9549	9590	9590	9590
q6	241	124	127	124
q7	3008	2446	2463	2446
q8	4430	4434	4472	4434
q9	13226	13041	13147	13041
q10	4118	4200	4213	4200
q11	774	644	655	644
q12	979	812	813	812
q13	4288	3585	3551	3551
q14	385	343	342	342
q15	570	526	527	526
q16	726	670	672	670
q17	3836	3922	3887	3887
q18	9485	9119	9156	9119
q19	1845	1782	1769	1769
q20	2402	2053	2026	2026
q21	8764	8556	8633	8556
q22	838	843	769	769
Total cold run time: 81115 ms
Total hot run time: 77934 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 44.32 seconds
stream load tsv: 569 seconds loaded 74807831229 Bytes, about 125 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.3 seconds inserted 10000000 Rows, about 353K ops/s
storage size: 17099787466 Bytes

@zy-kkk
Copy link
Member Author

zy-kkk commented Nov 21, 2023

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 370ee325326a2c5863c3a593376750992c42877d, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4958	4667	4680	4667
q2	367	146	163	146
q3	2035	1936	1914	1914
q4	1382	1288	1238	1238
q5	3962	3953	4021	3953
q6	246	135	132	132
q7	1401	881	902	881
q8	2766	2779	2771	2771
q9	9715	9666	9636	9636
q10	3489	3554	3533	3533
q11	380	244	249	244
q12	439	294	300	294
q13	4593	3801	3801	3801
q14	315	296	298	296
q15	581	530	523	523
q16	662	588	582	582
q17	1136	976	961	961
q18	7818	7292	7259	7259
q19	1668	1683	1676	1676
q20	567	305	301	301
q21	4368	3889	3934	3889
q22	473	367	374	367
Total cold run time: 53321 ms
Total hot run time: 49064 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4598	4594	4573	4573
q2	338	225	270	225
q3	4030	3984	3992	3984
q4	2726	2700	2699	2699
q5	9671	9620	9539	9539
q6	241	119	126	119
q7	3012	2449	2465	2449
q8	4480	4484	4451	4451
q9	13255	13063	13075	13063
q10	4102	4196	4194	4194
q11	760	645	700	645
q12	969	803	811	803
q13	4289	3552	3568	3552
q14	381	352	341	341
q15	572	526	519	519
q16	725	657	684	657
q17	3836	3876	3885	3876
q18	9667	8964	9030	8964
q19	1806	1764	1787	1764
q20	2411	2049	2058	2049
q21	8728	8606	8730	8606
q22	912	786	804	786
Total cold run time: 81509 ms
Total hot run time: 77858 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.15 seconds
stream load tsv: 572 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 33 seconds loaded 861443392 Bytes, about 24 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17098883542 Bytes

@morningman
Copy link
Contributor

Add comment to explain why there is no case to cover

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 21, 2023
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Collaborator

@LemonLiTree LemonLiTree left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zy-kkk zy-kkk merged commit 127525e into apache:master Nov 22, 2023
19 of 20 checks passed
@zy-kkk zy-kkk deleted the fix_jdbc_table_jackson branch November 22, 2023 07:22
seawinde pushed a commit to seawinde/doris that referenced this pull request Nov 28, 2023
In the previous PR apache#27124, we used `objectMapper.readValue` for deserialization. However, this method does not handle null fields, which can lead to issues when upgrading from older versions. Specifically, if a required field is missing in the persistent data, `String realColumnNamesJson = serializeMap.get(REAL_COLUMNS);` will return null, resulting in deserialization errors and frontend startup failure. This issue is likely to occur when upgrading from an older version that uses Jdbc Catalog to a new version including PR apache#27124. As this represents a specific upgrade scenario involving compatibility with old version data structures, it was not covered in the regular PR test cases. Given the specificity and difficulty in replicating such a scenario, no special test cases were added for this PR.
@zy-kkk zy-kkk removed the dev/2.0.4 label Dec 7, 2023
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
In the previous PR apache#27124, we used `objectMapper.readValue` for deserialization. However, this method does not handle null fields, which can lead to issues when upgrading from older versions. Specifically, if a required field is missing in the persistent data, `String realColumnNamesJson = serializeMap.get(REAL_COLUMNS);` will return null, resulting in deserialization errors and frontend startup failure. This issue is likely to occur when upgrading from an older version that uses Jdbc Catalog to a new version including PR apache#27124. As this represents a specific upgrade scenario involving compatibility with old version data structures, it was not covered in the regular PR test cases. Given the specificity and difficulty in replicating such a scenario, no special test cases were added for this PR.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants