-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
move projection pushdown optimization logic to ExecutionPlan trait #14235
move projection pushdown optimization logic to ExecutionPlan trait #14235
Conversation
Thank you @buraksenn. I'll take a look ASAP. |
Pushed the rest of the code. I've only left ProjectionExec which I think should be in projection_pushdown.rs file. I've one question, since projection pushdown has utils that other Execs depend on I've put it in |
@alamb are we okay with this API extension? I think it is inevitable at some point. I have a WIP PR (which has been in progress for a very long time, but I hope to get back to it soon) that would also benefit from such a method. |
Yes for sure -- I agree it is inevitable In my mind a step forward for making the optimizers more general too -- for example as written ProjectionPushdown doesn't work for user defined ExecutionPlan nodes. With this new API it does |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for working on this @buraksenn and @berkaysynnada -- it looks quite close and is 👨🍳 👌 very nice
02)--FilterExec: c2@1 > 10, projection=[c1@0] | ||
03)----RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1 | ||
04)------CsvExec: file_groups={1 group: [[WORKSPACE_ROOT/testing/data/csv/aggregate_test_100.csv]]}, projection=[c1, c2], has_header=true | ||
01)ProjectionExec: expr=[c1@0 as c1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of these plans don't look quite as good (they now have an unecessary projection)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, just moving the code changed the behavior. I got its cause and fixing now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems I've introduced that while moving it. Thanks for the fix @berkaysynnada
@@ -431,6 +434,20 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync { | |||
fn cardinality_effect(&self) -> CardinalityEffect { | |||
CardinalityEffect::Unknown | |||
} | |||
|
|||
/// Attempts to push down the given projection into the input of this `ExecutionPlan`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it makes sense to expose ProjectionExec here in this API -- it is a very special operator. In theory it would be nice to avoid the circular reference (ProjectionExec
is an ExecutionPlan
that depends on ProjetionExec
) but if it compiles I say we 🚢 🇮🇹
I am merging this once the CI passes one more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @buraksenn. Now it is ready to go 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is epic work @berkaysynnada and @buraksenn -- the fact none of the tests was changed is 👌 very nice
Which issue does this PR close?
It does not directly closes but is related
Rationale for this change
ProjectionPushdown
intodatafusion-physical-optimizer
crate #14184.For #11502 there is a ongoing work. Projection pushdown optimizer depends on CsvExec which blocks moving out of core crate. By moving swapping with projection logic to execs we are getting rid of dependencies and we can simply call
try_swapping_with_projection
for most execs.Next steps:
What changes are included in this PR?
try_swapping_with_projection
method to traitAre these changes tested?
Existing tests in pushdown
Are there any user-facing changes?
No