Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Use ORDER BY as part of MV's distribution key #19321

Closed
xxchan opened this issue Nov 8, 2024 · 2 comments · Fixed by #20176
Closed

Discussion: Use ORDER BY as part of MV's distribution key #19321

xxchan opened this issue Nov 8, 2024 · 2 comments · Fixed by #20176
Assignees
Milestone

Comments

@xxchan
Copy link
Member

xxchan commented Nov 8, 2024

context: https://risingwave-labs.slack.com/archives/C034TRPKN1F/p1731041466238309?thread_ts=1731032250.800469&cid=C034TRPKN1F

Some nodes:

  • dist_key just need to be subset of pk, not prefix.
  • MV pk is order_by | stream_key
  • before feat(optimizer): change stream join mv distribution key #13022, MV dist_key is upstream_dist_key (subset of stream_key). And we changed to whole stream_key.
  • For Index, we support distributed by. Since index is essentially the same as MV, there seem to be no reason not to also support it for MV
@github-actions github-actions bot added this to the release-2.2 milestone Nov 8, 2024
@xxchan xxchan changed the title Discussion: Use ORDER BY as MV's distribution key Discussion: Use ORDER BY as part of MV's distribution key Nov 8, 2024
@hzxa21
Copy link
Collaborator

hzxa21 commented Nov 13, 2024

An example to show what the dist key looks like now on main:

create table test_dist (v1 int, v2 int, v3 int, primary key (v1, v2));
create table test_dist2 (v1 int, v2 int, v3 int, primary key (v1, v2));
create table test_dist3 (v1 int, v2 int, v3 int, primary key (v1, v2));


describe test_dist;
-- primary key v1, v2
-- distribution key v1, v2

create materialized view test_dist_mv as select * from test_dist order by v1, v3;

describe test_dist_mv;
-- primary key v1, v3, v2
-- distribution key v1, v2

drop MATERIALIZED view test_join_dist;

create MATERIALIZED view test_join_dist as
select t1.v3 as a, t2.v3 as b, t3.v3 as c
from test_dist as t1
join test_dist2 as t2
on t1.v1 = t2.v1
join test_dist3 as t3
on t2.v1 = t3.v1;

describe test_join_dist;
-- primary key test_dist.v1, test_dist.v2, test_dist2.v2, test_dist3.v2, test_dist2.v1
-- distribution test_dist.v1, test_dist.v2, test_dist2.v2, test_dist3.v2, test_dist2.v1


create MATERIALIZED view test_join_dist2 as
select t1.v3 as a, t2.v3 as b, t3.v3 as c
from test_dist as t1
join test_dist2 as t2
on t1.v1 = t2.v1
join test_dist3 as t3
on t2.v1 = t3.v1
order by t1.v3, t1.v2;

describe test_join_dist2;
-- primary key test_dist.v3, test_dist.v2, test_dist.v1, test_dist2.v2, test_dist3.v2, test_dist2.v1
-- distribution key test_dist2.v1

ideally the dist key for test_dist and dist_join_dist2 can be:

describe test_dist_mv;
-- primary key v1, v3, v2
-- distribution key v1, v3, ...


describe test_join_dist2;
-- primary key test_dist.v3, test_dist.v2, test_dist.v1, test_dist2.v2, test_dist3.v2, test_dist2.v1
-- distribution key test_dist.v3, test_dist.v2, ...

Copy link
Contributor

This issue has been open for 60 days with no activity.

If you think it is still relevant today, and needs to be done in the near future, you can comment to update the status, or just manually remove the no-issue-activity label.

You can also confidently close this issue as not planned to keep our backlog clean.
Don't worry if you think the issue is still valuable to continue in the future.
It's searchable and can be reopened when it's time. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants