Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

move projection pushdown optimization logic to ExecutionPlan trait #14235

Conversation

buraksenn
Copy link
Contributor

@buraksenn buraksenn commented Jan 22, 2025

Which issue does this PR close?

It does not directly closes but is related

Rationale for this change

For #11502 there is a ongoing work. Projection pushdown optimizer depends on CsvExec which blocks moving out of core crate. By moving swapping with projection logic to execs we are getting rid of dependencies and we can simply call try_swapping_with_projection for most execs.

Next steps:

  • Migrate projection pushdown out of core crate
  • Try to migrate other projection pushdowns

What changes are included in this PR?

  • Add try_swapping_with_projection method to trait
  • Move applicable methods from projection pushdown to trait implementations

Are these changes tested?

Existing tests in pushdown

Are there any user-facing changes?

No

@github-actions github-actions bot added physical-expr Physical Expressions optimizer Optimizer rules core Core DataFusion crate labels Jan 22, 2025
@berkaysynnada
Copy link
Contributor

Thank you @buraksenn. I'll take a look ASAP.

@buraksenn
Copy link
Contributor Author

buraksenn commented Jan 22, 2025

Pushed the rest of the code. I've only left ProjectionExec which I think should be in projection_pushdown.rs file.

I've one question, since projection pushdown has utils that other Execs depend on I've put it in datafusion/physical-plan/src/projection_utils.rs, I could not find a better place. I've wanted to put it in physical-optimizer but it caused cyclic dependency. Where should this logic be?

@berkaysynnada
Copy link
Contributor

@alamb are we okay with this API extension? I think it is inevitable at some point. I have a WIP PR (which has been in progress for a very long time, but I hope to get back to it soon) that would also benefit from such a method.

@github-actions github-actions bot added the sqllogictest SQL Logic Tests (.slt) label Jan 24, 2025
@alamb
Copy link
Contributor

alamb commented Jan 24, 2025

@alamb are we okay with this API extension? I think it is inevitable at some point. I have a WIP PR (which has been in progress for a very long time, but I hope to get back to it soon) that would also benefit from such a method.

Yes for sure -- I agree it is inevitable

In my mind a step forward for making the optimizers more general too -- for example as written ProjectionPushdown doesn't work for user defined ExecutionPlan nodes. With this new API it does

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for working on this @buraksenn and @berkaysynnada -- it looks quite close and is 👨‍🍳 👌 very nice

02)--FilterExec: c2@1 > 10, projection=[c1@0]
03)----RepartitionExec: partitioning=RoundRobinBatch(4), input_partitions=1
04)------CsvExec: file_groups={1 group: [[WORKSPACE_ROOT/testing/data/csv/aggregate_test_100.csv]]}, projection=[c1, c2], has_header=true
01)ProjectionExec: expr=[c1@0 as c1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of these plans don't look quite as good (they now have an unecessary projection)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, just moving the code changed the behavior. I got its cause and fixing now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems I've introduced that while moving it. Thanks for the fix @berkaysynnada

@@ -431,6 +434,20 @@ pub trait ExecutionPlan: Debug + DisplayAs + Send + Sync {
fn cardinality_effect(&self) -> CardinalityEffect {
CardinalityEffect::Unknown
}

/// Attempts to push down the given projection into the input of this `ExecutionPlan`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree it makes sense to expose ProjectionExec here in this API -- it is a very special operator. In theory it would be nice to avoid the circular reference (ProjectionExec is an ExecutionPlan that depends on ProjetionExec) but if it compiles I say we 🚢 🇮🇹

@buraksenn buraksenn changed the title add try_swapping_with_projection method to ExecutionPlan trait move projection pushdown optimization logic to ExecutionPlan trait Jan 24, 2025
@berkaysynnada
Copy link
Contributor

I am merging this once the CI passes one more

Copy link
Contributor

@berkaysynnada berkaysynnada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @buraksenn. Now it is ready to go 🚀

@berkaysynnada berkaysynnada merged commit f775791 into apache:main Jan 25, 2025
25 checks passed
@berkaysynnada berkaysynnada deleted the add-try-swapping-with-projection-to-ex-plan-trait branch January 25, 2025 19:30
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is epic work @berkaysynnada and @buraksenn -- the fact none of the tests was changed is 👌 very nice

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core Core DataFusion crate optimizer Optimizer rules physical-expr Physical Expressions sqllogictest SQL Logic Tests (.slt)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants