Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature](mtmv)(3)Implementing multi table materialized views #26146

Merged
merged 96 commits into from
Nov 24, 2023

Conversation

zddr
Copy link
Contributor

@zddr zddr commented Oct 31, 2023

Proposed changes

Issue Number: close #xxx

Implement mtmv

Introduction to Main Classes:

  • MTMVService:MTMV services for other modules to call
  • MTMVHookService:All operations that affect the MTMV
    • MTMVJobManager:All operations that affect the MTMV job
    • MTMVCacheManager:All operations that affect the MTMV Cache
  • MTMVTask&MTMVJob:Inherit from job framework

Further comments

If this is a relatively large or complex change, kick off the discussion at [email protected] by explaining why you chose the solution you did and what alternatives you considered, etc...

@zddr zddr marked this pull request as draft October 31, 2023 03:35
Copy link
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clang-tidy made some suggestions

be/src/vec/exec/scan/vmeta_scanner.cpp Outdated Show resolved Hide resolved
zddr and others added 4 commits November 14, 2023 09:28
- add show task/job infos method
- add mtmv thread pool in disruptorGroupManager
return false;
}

public static MTMVCache generateMTMVCache(MTMV mtmv, ConnectContext ctx) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache should contain mtmv self

private static final Logger LOG = LogManager.getLogger(MTMVCacheManager.class);
private Map<BaseTableInfo, Set<BaseTableInfo>> tableMTMVs = Maps.newConcurrentMap();

public Set<BaseTableInfo> getMtmvsByBaseTable(BaseTableInfo table) {
Copy link
Contributor

@seawinde seawinde Nov 23, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should also provide list method param to get materailzed view
and shoud return the materailzed caches

@zddr
Copy link
Contributor Author

zddr commented Nov 23, 2023

run buildall

@github-actions github-actions bot removed the approved Indicates a PR has been approved by one committer. label Nov 23, 2023
@zddr
Copy link
Contributor Author

zddr commented Nov 23, 2023

run feut

@zddr
Copy link
Contributor Author

zddr commented Nov 23, 2023

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit 719fc657b064a7340399bac12b254ab6313d79d0, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4953	4637	4660	4637
q2	360	150	157	150
q3	2042	1921	1940	1921
q4	1388	1246	1250	1246
q5	3951	3936	4016	3936
q6	250	131	134	131
q7	1417	898	889	889
q8	2779	2782	2781	2781
q9	9558	9536	9430	9430
q10	3476	3513	3525	3513
q11	387	237	247	237
q12	437	297	292	292
q13	4558	3813	3821	3813
q14	316	297	283	283
q15	581	540	526	526
q16	657	594	581	581
q17	1127	970	957	957
q18	7690	7310	7307	7307
q19	1675	1683	1676	1676
q20	575	301	290	290
q21	4427	3975	4011	3975
q22	469	371	379	371
Total cold run time: 53073 ms
Total hot run time: 48942 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4610	4589	4561	4561
q2	332	217	259	217
q3	4019	3986	3989	3986
q4	2708	2690	2693	2690
q5	9596	9633	9605	9605
q6	246	123	123	123
q7	3046	2474	2489	2474
q8	4467	4406	4501	4406
q9	12915	12730	12849	12730
q10	4097	4194	4195	4194
q11	804	688	697	688
q12	974	812	819	812
q13	4279	3581	3608	3581
q14	381	348	344	344
q15	581	515	517	515
q16	740	689	671	671
q17	3958	3844	3857	3844
q18	9522	9071	9144	9071
q19	1842	1768	1780	1768
q20	2409	2075	2052	2052
q21	8785	8494	8527	8494
q22	873	814	778	778
Total cold run time: 81184 ms
Total hot run time: 77604 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 43.69 seconds
stream load tsv: 577 seconds loaded 74807831229 Bytes, about 123 MB/s
stream load json: 18 seconds loaded 2358488459 Bytes, about 124 MB/s
stream load orc: 65 seconds loaded 1101869774 Bytes, about 16 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.9 seconds inserted 10000000 Rows, about 346K ops/s
storage size: 17096685288 Bytes

@zddr
Copy link
Contributor Author

zddr commented Nov 23, 2023

run buildall

@doris-robot
Copy link

TPC-H test result on machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
Tpch sf100 test result on commit a7759cc81fab02e18ea8ac9a02f156f4aeb5ca34, data reload: false

run tpch-sf100 query with default conf and session variables
q1	4910	4675	4639	4639
q2	376	143	161	143
q3	2023	1904	1894	1894
q4	1376	1262	1204	1204
q5	3975	3992	4102	3992
q6	254	125	129	125
q7	1419	876	866	866
q8	2768	2786	2772	2772
q9	59051	13490	9454	9454
q10	13150	3540	3496	3496
q11	400	234	241	234
q12	1680	290	295	290
q13	21854	3780	3789	3780
q14	330	287	304	287
q15	581	541	526	526
q16	674	584	599	584
q17	1150	965	931	931
q18	7771	7615	7335	7335
q19	2315	1667	1664	1664
q20	539	311	274	274
q21	7819	4017	3973	3973
q22	482	384	375	375
Total cold run time: 134897 ms
Total hot run time: 48838 ms

run tpch-sf100 query with default conf and set session variable runtime_filter_mode=off
q1	4653	4574	4594	4574
q2	324	244	258	244
q3	4023	4007	3992	3992
q4	2715	2686	2702	2686
q5	9663	9671	9724	9671
q6	247	118	123	118
q7	3030	2453	2456	2453
q8	4423	4480	4447	4447
q9	12945	12958	12838	12838
q10	4088	4160	4201	4160
q11	774	652	692	652
q12	975	823	822	822
q13	5822	3591	3509	3509
q14	384	356	343	343
q15	573	531	529	529
q16	743	676	675	675
q17	3893	3782	3963	3782
q18	9643	9085	9016	9016
q19	1832	1786	1771	1771
q20	2410	2093	2057	2057
q21	8878	8663	8414	8414
q22	865	814	796	796
Total cold run time: 82903 ms
Total hot run time: 77549 ms

@doris-robot
Copy link

(From new machine)TeamCity pipeline, clickbench performance test result:
the sum of best hot time: 45.78 seconds
stream load tsv: 572 seconds loaded 74807831229 Bytes, about 124 MB/s
stream load json: 19 seconds loaded 2358488459 Bytes, about 118 MB/s
stream load orc: 66 seconds loaded 1101869774 Bytes, about 15 MB/s
stream load parquet: 32 seconds loaded 861443392 Bytes, about 25 MB/s
insert into select: 28.7 seconds inserted 10000000 Rows, about 348K ops/s
storage size: 17099610803 Bytes

Copy link
Contributor

@morningman morningman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Nov 24, 2023
Copy link
Contributor

PR approved by at least one committer and no changes requested.

@morningman morningman merged commit dfe3a2d into apache:master Nov 24, 2023
17 of 18 checks passed
seawinde pushed a commit to seawinde/doris that referenced this pull request Nov 28, 2023
…#26146)

Introduction to Main Classes:
- MTMVService:MTMV services for other modules to call
- MTMVHookService:All operations that affect the MTMV
  - MTMVJobManager:All operations that affect the MTMV job
  - MTMVCacheManager:All operations that affect the MTMV Cache
- MTMVTask&MTMVJob:Inherit from job framework
XuJianxu pushed a commit to XuJianxu/doris that referenced this pull request Dec 14, 2023
…#26146)

Introduction to Main Classes:
- MTMVService:MTMV services for other modules to call
- MTMVHookService:All operations that affect the MTMV
  - MTMVJobManager:All operations that affect the MTMV job
  - MTMVCacheManager:All operations that affect the MTMV Cache
- MTMVTask&MTMVJob:Inherit from job framework
@zddr zddr deleted the mv_3 branch March 28, 2024 02:36
starocean999 pushed a commit that referenced this pull request Sep 20, 2024
…zed view (#40658)

## Proposed changes
This is brought by #26146
If create materialized view as following, Should fail, because has the
duplicated column name `o_orderdatE` and `o_orderdate`. But now can
create materialized view successfully. the pr fix this.

```sql
        CREATE MATERIALIZED VIEW mv_1
        BUILD IMMEDIATE REFRESH AUTO ON MANUAL 
        partition by(o_orderdate) 
        DISTRIBUTED BY RANDOM BUCKETS 2 
        PROPERTIES ('replication_num' = '1') 
        AS  
        select o_orderdatE, o_shippriority, o_comment, o_orderdate, 
        sum(o_totalprice) as sum_total, 
        max(o_totalpricE) as max_total, 
        min(o_totalprice) as min_total, 
        count(*) as count_all, 
        bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) cnt_1, 
        bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as cnt_2 
        from (select * from orders) as t1
        group by 
        o_orderdatE, 
        o_shippriority, 
        o_comment,
        o_orderdate;
```
dataroaring pushed a commit that referenced this pull request Sep 26, 2024
…zed view (#40658)

## Proposed changes
This is brought by #26146
If create materialized view as following, Should fail, because has the
duplicated column name `o_orderdatE` and `o_orderdate`. But now can
create materialized view successfully. the pr fix this.

```sql
        CREATE MATERIALIZED VIEW mv_1
        BUILD IMMEDIATE REFRESH AUTO ON MANUAL 
        partition by(o_orderdate) 
        DISTRIBUTED BY RANDOM BUCKETS 2 
        PROPERTIES ('replication_num' = '1') 
        AS  
        select o_orderdatE, o_shippriority, o_comment, o_orderdate, 
        sum(o_totalprice) as sum_total, 
        max(o_totalpricE) as max_total, 
        min(o_totalprice) as min_total, 
        count(*) as count_all, 
        bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) cnt_1, 
        bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as cnt_2 
        from (select * from orders) as t1
        group by 
        o_orderdatE, 
        o_shippriority, 
        o_comment,
        o_orderdate;
```
seawinde added a commit to seawinde/doris that referenced this pull request Oct 14, 2024
…zed view (apache#40658)

This is brought by apache#26146
If create materialized view as following, Should fail, because has the
duplicated column name `o_orderdatE` and `o_orderdate`. But now can
create materialized view successfully. the pr fix this.

```sql
        CREATE MATERIALIZED VIEW mv_1
        BUILD IMMEDIATE REFRESH AUTO ON MANUAL
        partition by(o_orderdate)
        DISTRIBUTED BY RANDOM BUCKETS 2
        PROPERTIES ('replication_num' = '1')
        AS
        select o_orderdatE, o_shippriority, o_comment, o_orderdate,
        sum(o_totalprice) as sum_total,
        max(o_totalpricE) as max_total,
        min(o_totalprice) as min_total,
        count(*) as count_all,
        bitmap_union(to_bitmap(case when o_shippriority > 1 and o_orderkey IN (1, 3) then o_custkey else null end)) cnt_1,
        bitmap_union(to_bitmap(case when o_shippriority > 2 and o_orderkey IN (2) then o_custkey else null end)) as cnt_2
        from (select * from orders) as t1
        group by
        o_orderdatE,
        o_shippriority,
        o_comment,
        o_orderdate;
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. meta-change reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants