Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eager planning errors #198

Open
eredzik opened this issue Nov 4, 2024 · 1 comment
Open

Eager planning errors #198

eredzik opened this issue Nov 4, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@eredzik
Copy link
Contributor

eredzik commented Nov 4, 2024

When one runs some pyspark code it eagerly adds statements to query plan on each method call on DataFrame (select, join, filter etc.) and if expression is invalid then it throws an error. In this library implementation plan is computed on each step but only verified once query is sent for execution and in my case it throws error when duckdb tries to execute query.

It poses problem in development when I have few hundreds lines of pyspark code and don't get precise stack trace error what happened - it points to final line where materialization occurs.

Expected behavior:
Verify whether query plan is valid with respect to input data on each added step to plan and fail adding when something cannot be resolved or is invalid.

@eakmanrq eakmanrq added the enhancement New feature or request label Nov 19, 2024
@eakmanrq
Copy link
Owner

Thanks for opening this and it is something I have thought about before.

I think the way to do this would be by running explain plans after each operation to ensure the SQL is valid. For local engines, like DuckDB, this would have no negative user impact but for remote engines it would slow things down. So I could see this being on option is that is configurable by the user but enabled by default for DuckDB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants