-
Notifications
You must be signed in to change notification settings - Fork 566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse Postgres's LOCK TABLE statement #1614
base: main
Are you sure you want to change the base?
Parse Postgres's LOCK TABLE statement #1614
Conversation
7ce8f33
to
47a3e5e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @freshtonic!
src/parser/mod.rs
Outdated
let projection = if dialect_of!(self is PostgreSqlDialect | GenericDialect) | ||
&& self.peek_keyword(Keyword::FROM) | ||
{ | ||
vec![] | ||
} else { | ||
self.parse_projection()? | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can probably skip these changes in this PR given it's now in #1613?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iffyio oh my bad - I should have branched from the apache main
branch instead of our fork's main
before pushing. I'll remedy this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iffyio I tried this and it's not straightforward without storing a value on the variant the identifies the dialect that was used to parse the AST.
The following syntax would be problematic (to render, in Display
):
LOCK customers;
In PG, the TABLE
keyword is optional. In MySQL one of TABLE
or TABLES
is mandatory.
The Display
impl for Statement
, in the LockTable { .. }
match arm could potentially generate SQL that will not be parsable by Postgres if a TABLES
keyword is emitted.
Is there precedent for choosing how to render an AST fragment using a stored value to encode the dialect (or a proxy to the dialect) that was used to parse the AST?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah we could probably use an enum to represent the variants, something like?
enum LockTableKind {
TABLE
TABLES
}
Statement::LockTable { table_kind: Option<LockTableKind> }
see TableSampleKind for example
47a3e5e
to
cd9919b
Compare
@iffyio I've pushed another attempt at this. Munging the Postgres and MySQL versions together into the same In my opinion, introducing a I should point out that this is a breaking change to the Due to it being a breaking change and directly signalling DB dialect in the AST not appearing to be an idiom that is used elsewhere I don't have much confidence this PR will be accepted but one can hope :) |
/// ```sql | ||
/// UNLOCK TABLES | ||
/// ``` | ||
/// Note: this is a MySQL-specific statement. See <https://dev.mysql.com/doc/refman/8.0/en/lock-tables.html> | ||
UnlockTables, | ||
UnlockTables(bool), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UnlockTables(bool), | |
UnlockTables(UnlockTables), |
maybe we use a dedicated struct here as well? Its not clear what the bool property implies otherwise
pub struct LockTable { | ||
pub table: Ident, | ||
#[non_exhaustive] | ||
pub enum LockTables { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to avoid variants that are specific to dialects, those tend to make it more difficult to reuse the parser code and ast representations across dialects. Representation wise, I think both variants can be merged into a struct with something like the following?
struct TableLock {
pub table: ObjectName,
pub alias: Option<Ident>,
pub lock_type: Option<LockTableType>,
}
struct LockTable {
pluralized_table_keyword: Option<bool>, // If None, then no table keyword was provided
locks: Vec<TableLock>,
lock_mode: Option<LockTableType>,
only: bool,
no_wait: bool
}
similarly, the parser uses the same impl to create the struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want to avoid variants that are specific to dialects, those tend to make it more difficult to reuse the parser code and ast representations across dialects
MySQL's & Postgres's LOCK
statements have minimal overlap. They are similar in name only.
- MySQL allows different lock modes per table vs Postgres one lock mode applied to all tables
- MySQL's lock modes and Postgres's lock modes do not overlap at all
- MySQL has freely interchangeable table keywords, one of which MUST be present:
TABLE
orTABLES
- Postgres has one
TABLE
keyword but it's optional - Postgres supports additional (optional)
ONLY
andNOWAIT
keywords - It is never valid to mix and match the Postgres-specific syntax with MySQL-specific syntax.
I agree that explicit database-specific AST fragments make parser reuse more difficult but in the case of LOCK
pretty much nothing is reusable.
The db-specific AST pieces do make implementing Display
a lot less error prone.
It also makes it (almost) impossible be able to represent invalid AST (e.g. a mix of PG & MySQL) except I didn't take a hard stance on this for LockTableType
which does mix MySQL & Postgres bits.
None of this is a hill I will die on, but handling the burden of grammar differences in the parser and AST design means consuming the AST correctly in downstream projects will be easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None of this is a hill I will die on, but handling the burden of grammar differences in the parser and AST design means consuming the AST correctly in downstream projects will be easier.
What I mean by this:
When pieces of dialect-specific syntax are mixed in the same AST struct/enum variant the consumers have to understand the dialect-specific differences in order to know which fields they can safely ignore.
At CipherStash I wrote a type-inferencer for SQL statements in order to determine if specific transformations can be performed safely. It uses sqlparser
's AST and there are a lot of cases where I had to spend time understanding which AST node fields or combinations of field values I can safely ignore when I'm only targeting Postgres.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree that downstream crates targeting a single dialect would be easier to implement by essentially having dialect specific AST representations (on the other extreme there are downstream crates that would like to process the AST in a dialect agnostic manner,, we also have custom dialects in other downstream crates that need support). I think there are pros/cons to this approach vs the current one followed by the parser which puts some of the responsibility on the downstream crate. I'm thinking in any case ideally we would want to keep to the current approach for the PR while shift in approaches could be tackled as its own dedicated proposal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @iffyio that following the existing patterns in this crate is the best thing for this PR
If we were starting a new project having dialect specific AST structures would make sense to consider but at this point ensuring the code is consistent is more important in my opinion
#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)] | ||
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))] | ||
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))] | ||
#[non_exhaustive] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#[non_exhaustive] |
I think we tend to not use this attribute, I think there are pros/cons with using it but better to keep with the existing convention in this PR
See: https://www.postgresql.org/docs/current/sql-lock.html PG's full syntax for this statement is supported: ``` LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ] where lockmode is one of: ACCESS SHARE | ROW SHARE | ROW EXCLUSIVE | SHARE UPDATE EXCLUSIVE | SHARE | SHARE ROW EXCLUSIVE | EXCLUSIVE | ACCESS EXCLUSIVE ``` MySQL and Postgres have support very different syntax for `LOCK TABLE` and are implemented with a breaking change on the `Statement::LockTables { .. }` variant, turning the variant into one which accepts a `LockTables` enum with variants for MySQL and Posgres.
cd9919b
to
d5cde4f
Compare
Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look |
See: https://www.postgresql.org/docs/current/sql-lock.html
PG's full syntax for this statement is supported:
It is implemented to not intefere with the roughly equivalent (but different) syntax in MySQL, by using a new Statement variant.