-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a hint about normalization in error message #14089
Comments
+1 this would be great! That being said I do believe in the long run BC breaking changes are a good thing. Perhaps it makes sense to aggregate changes like these into a central issue and next time there's a big release that requires BC changes, we can include these small ones in as well. |
I am not opposed to breaking changes per se :) (see the past history of DataFusion) However, in this case I think the challege is that either default will be good for some people and bad for others. It is unclear to me how we would decide on the balance if more people are helped or hurt by changing the default (I would guess it is about 50/50 -- for example, it would certainly cause trouble I think for us at InfluxData...) |
take |
SQL spec seems to say that unquoted (not delimited) identifiers are equivalent to their upper-case delimited variants (ie Quoting SQL'16 here
However, DataFusion follows PostgreSQL variant of SQL, so my interpretation is that our intended behavior is "like in SQL spec but lower-case". |
I agree with @findepi -- I think of the automatic lowercasing in DataFusion as the implementation detail of how the case-normalization is implemented. |
* Add a hint about normalization in error message (#14089) * normalization suggestion is only shown when a column name matches schema --------- Co-authored-by: Sergey Zhukov <[email protected]>
Is your feature request related to a problem or challenge?
There has been significant and long standing confusion about how to refer to columns with capitalization, most recently:
I think the root cause is that SQL is largely case insensitive but many DataFrame like systems are case sensitive.
There is a larger question if we could perhaps have less confusing defaults, but I think we could make the error messages even better
Today if you have a schema with a field named
userId
(note the capitalI
), if you run a query likeyou will get a seemingly nonsensical error:
Describe the solution you'd like
I would like to improve the error to add a hint if there is a column name that matches the field exept for case about what to do to fix it.
In the example above something like:
So the whole error would be something like
Describe alternatives you've considered
No response
Additional context
I think this would be super helpful and a nice first issue
The text was updated successfully, but these errors were encountered: