formatting the AST while preserving the source location information from the original query #1634

lovasoa · 2025-01-02T14:05:01Z

In the current sqlparser, parsing

select
  a,
  b
from
  t

and then fromatting it back to a string results in

select a,b from t

Since sqlparser v0.53, we have the source location information inside the AST, which theoretically makes it possible to reconstruct the original query formatting (or at least getting very close to it).

I know this is a huge undertaking, but it would be really useful. For instance, in sqlpage, I would love being able to respect the initial user formatting of queries, in order for database errors that contain source position information to be meaningful and not confusing.

I'm opening this issue to start discussing this with the community (and especially with you @iffyio and @alamb 😉 ).

Am I the only one interested in this ? What would be the best approach to implementing this ?

…implementation) this implements (a tiny portion of) apache#1634 pros: really useful when passing formatted queries to a real database, in order for database error message locations to match the original user's source locations cons: if we want to do it well, we need to track source locations better, and this adds a complexity to the Display imlementations

freshtonic · 2025-01-04T11:41:06Z

@lovasoa 👋 I'm interested in this

alamb · 2025-01-04T19:28:54Z

I don't understand the usecase for preserving the original formatting after Display the AST.

Specifically, to preserve the original SQL for error messages you could also keep the actual original SQL string somewhere and on error construct the error message using the orignal SQL (and the locations in that text that are now in Spans)

I think you could take that approach today without any additional work.

As you hint in this description, trying to preserve the original formatting would likely be a very large undertaking (and thus I suspect practically infeasible given the amount of engineering we have available)

lovasoa · 2025-01-04T20:08:15Z

Specifically, to preserve the original SQL for error messages you could also keep the actual original SQL string somewhere and on error construct the error message using the orignal SQL (and the locations in that text that are now in Spans)

I think there is a misunderstanding. My use case is: the user enters a query. I need to parse it, potentially edit the ast, and then stringify it back before feeding it to the database. When the database returns an error, the error contains position information. This position information references the stringified AST we fed to the database, not the original string entered by the user. Currently, there is no way to map that position in the output of Display back to a position in the original string.

As you hint in this description, trying to preserve the original formatting would likely be a very large undertaking (and thus I suspect practically infeasible given the amount of engineering we have available)

That's true. If I'm the only one with that need, then it's probably not worth it. But maybe I'm not ? And the work does not have to be exhaustive to be useful. I've already done it just for respecting newlines and indentation in UPDATE statements (because the syntax is simpler than SELECT) in #1636. I think improving upon it and doing something similar at least for newlines around keywords and identifier lists in SELECT .. FROM ... WHERE would already be quite useful.

Regarding resources: @freshtonic , if this is useful to you, do you think this is something cipherstash could dedicate a few engineering hours to ?

alamb · 2025-01-04T21:53:10Z

I think there is a misunderstanding. My use case is: the user enters a query. I need to parse it, potentially edit the ast, and then stringify it back before feeding it to the database. When the database returns an error, the error contains position information. This position information references the stringified AST we fed to the database, not the original string entered by the user. Currently, there is no way to map that position in the output of Display back to a position in the original string

I see the challenge -- you have to potentially edit the AST

I wonder if you could do something (crazy) like

re-parse the stringified AST to get span information of the Display format
Walk the original AST and the re-parsed AST using the spans to create a mapping back to the original query

lovasoa · 2025-01-04T22:27:32Z

That could work, but it's not necessarily a lot less work than a Display implementation that respects the Span information, and the result would be wasteful and less generally useful to other users of sqlparser...

graup · 2025-01-10T21:43:34Z

I'm playing around with a different approach for editing parts of the AST that works more like tools like eslint work. The idea is that given accurate source spans, why don't we just edit the original query directly without round-tripping the whole AST?

The steps would be: 1. parse query into AST. 2. Visit the AST, looking for nodes we want to replace. 3. Construct and render a new AST node for local replacement. 4. Use the old node's span() to replace the new text right in the source.

This way we preserve all the other formatting outside of the node we're replacing, without any work on the AST displaying logic.

Of course, this needs accurate and complete spans which we don't have yet, but sounds feasible to me.

lovasoa · 2025-01-10T22:02:44Z

That would be great ! But with the current (incorrect) span information, we would end up with syntactically invalid sql.

alamb · 2025-01-11T17:57:02Z

That would be great ! But with the current (incorrect) span information, we would end up with syntactically invalid sql.

This seems like it is a reason to focus on improving the span accuracy (which is valuable for other reasons too) first before trying to also carry along the original tokens

lovasoa mentioned this issue Jan 2, 2025

preserve sql formatting through a parse + display roundtrip (partial implementation) #1636

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

formatting the AST while preserving the source location information from the original query #1634

formatting the AST while preserving the source location information from the original query #1634

lovasoa commented Jan 2, 2025

freshtonic commented Jan 4, 2025

alamb commented Jan 4, 2025 •

edited

Loading

lovasoa commented Jan 4, 2025

alamb commented Jan 4, 2025

lovasoa commented Jan 4, 2025

graup commented Jan 10, 2025

lovasoa commented Jan 10, 2025

alamb commented Jan 11, 2025

formatting the AST while preserving the source location information from the original query #1634

formatting the AST while preserving the source location information from the original query #1634

Comments

lovasoa commented Jan 2, 2025

freshtonic commented Jan 4, 2025

alamb commented Jan 4, 2025 • edited Loading

lovasoa commented Jan 4, 2025

alamb commented Jan 4, 2025

lovasoa commented Jan 4, 2025

graup commented Jan 10, 2025

lovasoa commented Jan 10, 2025

alamb commented Jan 11, 2025

alamb commented Jan 4, 2025 •

edited

Loading