You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When one runs some pyspark code it eagerly adds statements to query plan on each method call on DataFrame (select, join, filter etc.) and if expression is invalid then it throws an error. In this library implementation plan is computed on each step but only verified once query is sent for execution and in my case it throws error when duckdb tries to execute query.
It poses problem in development when I have few hundreds lines of pyspark code and don't get precise stack trace error what happened - it points to final line where materialization occurs.
Expected behavior:
Verify whether query plan is valid with respect to input data on each added step to plan and fail adding when something cannot be resolved or is invalid.
The text was updated successfully, but these errors were encountered:
Thanks for opening this and it is something I have thought about before.
I think the way to do this would be by running explain plans after each operation to ensure the SQL is valid. For local engines, like DuckDB, this would have no negative user impact but for remote engines it would slow things down. So I could see this being on option is that is configurable by the user but enabled by default for DuckDB.
When one runs some pyspark code it eagerly adds statements to query plan on each method call on DataFrame (select, join, filter etc.) and if expression is invalid then it throws an error. In this library implementation plan is computed on each step but only verified once query is sent for execution and in my case it throws error when duckdb tries to execute query.
It poses problem in development when I have few hundreds lines of pyspark code and don't get precise stack trace error what happened - it points to final line where materialization occurs.
Expected behavior:
Verify whether query plan is valid with respect to input data on each added step to plan and fail adding when something cannot be resolved or is invalid.
The text was updated successfully, but these errors were encountered: