Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse Postgres's LOCK TABLE statement #1614

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

freshtonic
Copy link

See: https://www.postgresql.org/docs/current/sql-lock.html

PG's full syntax for this statement is supported:

LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ]

where lockmode is one of:

    ACCESS SHARE | ROW SHARE | ROW EXCLUSIVE | SHARE UPDATE EXCLUSIVE
    | SHARE | SHARE ROW EXCLUSIVE | EXCLUSIVE | ACCESS EXCLUSIVE

It is implemented to not intefere with the roughly equivalent (but different) syntax in MySQL, by using a new Statement variant.

@freshtonic freshtonic force-pushed the james/cip-1063-add-lock-table-support-in-sqlparser branch from 7ce8f33 to 47a3e5e Compare December 20, 2024 05:38
Copy link
Contributor

@iffyio iffyio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @freshtonic!

Comment on lines 9607 to 9613
let projection = if dialect_of!(self is PostgreSqlDialect | GenericDialect)
&& self.peek_keyword(Keyword::FROM)
{
vec![]
} else {
self.parse_projection()?
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably skip these changes in this PR given it's now in #1613?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iffyio oh my bad - I should have branched from the apache main branch instead of our fork's main before pushing. I'll remedy this.

Copy link
Author

@freshtonic freshtonic Dec 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iffyio I tried this and it's not straightforward without storing a value on the variant the identifies the dialect that was used to parse the AST.

The following syntax would be problematic (to render, in Display):

LOCK customers;

In PG, the TABLE keyword is optional. In MySQL one of TABLE or TABLES is mandatory.

The Display impl for Statement, in the LockTable { .. } match arm could potentially generate SQL that will not be parsable by Postgres if a TABLES keyword is emitted.

Is there precedent for choosing how to render an AST fragment using a stored value to encode the dialect (or a proxy to the dialect) that was used to parse the AST?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yeah we could probably use an enum to represent the variants, something like?

enum LockTableKind {
    TABLE
    TABLES
}
Statement::LockTable { table_kind: Option<LockTableKind> }

see TableSampleKind for example

src/ast/mod.rs Outdated Show resolved Hide resolved
@freshtonic freshtonic force-pushed the james/cip-1063-add-lock-table-support-in-sqlparser branch from 47a3e5e to cd9919b Compare January 4, 2025 12:43
@freshtonic
Copy link
Author

freshtonic commented Jan 4, 2025

@iffyio I've pushed another attempt at this.

Munging the Postgres and MySQL versions together into the same Statement::LockTables { .. } variant was painful due to requiring additional fields that could be relied upon for a correct Display implementation which would produce a valid statement for both MySQL and Postgres.

In my opinion, introducing a LockTables enum (with two variants for Postgres & MySQL and which is marked non_exhaustive in order to support other variations in the future) is less of a cognitive burden than inlining all variations as lots of optional fields on the same struct.

I should point out that this is a breaking change to the Statement::LockTables variant (the variant is now tuple-style with a single LockTables field instead of being a struct-style variant.

Due to it being a breaking change and directly signalling DB dialect in the AST not appearing to be an idiom that is used elsewhere I don't have much confidence this PR will be accepted but one can hope :)

/// ```sql
/// UNLOCK TABLES
/// ```
/// Note: this is a MySQL-specific statement. See <https://dev.mysql.com/doc/refman/8.0/en/lock-tables.html>
UnlockTables,
UnlockTables(bool),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
UnlockTables(bool),
UnlockTables(UnlockTables),

maybe we use a dedicated struct here as well? Its not clear what the bool property implies otherwise

pub struct LockTable {
pub table: Ident,
#[non_exhaustive]
pub enum LockTables {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to avoid variants that are specific to dialects, those tend to make it more difficult to reuse the parser code and ast representations across dialects. Representation wise, I think both variants can be merged into a struct with something like the following?

struct TableLock {
  pub table: ObjectName,
  pub alias: Option<Ident>,
  pub lock_type: Option<LockTableType>,
}
struct LockTable {
  pluralized_table_keyword: Option<bool>, // If None, then no table keyword was provided
  locks: Vec<TableLock>,
  lock_mode: Option<LockTableType>,
  only: bool,
  no_wait: bool
}

similarly, the parser uses the same impl to create the struct

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to avoid variants that are specific to dialects, those tend to make it more difficult to reuse the parser code and ast representations across dialects

MySQL's & Postgres's LOCK statements have minimal overlap. They are similar in name only.

  • MySQL allows different lock modes per table vs Postgres one lock mode applied to all tables
  • MySQL's lock modes and Postgres's lock modes do not overlap at all
  • MySQL has freely interchangeable table keywords, one of which MUST be present: TABLE or TABLES
  • Postgres has one TABLE keyword but it's optional
  • Postgres supports additional (optional) ONLY and NOWAIT keywords
  • It is never valid to mix and match the Postgres-specific syntax with MySQL-specific syntax.

I agree that explicit database-specific AST fragments make parser reuse more difficult but in the case of LOCK pretty much nothing is reusable.

The db-specific AST pieces do make implementing Display a lot less error prone.

It also makes it (almost) impossible be able to represent invalid AST (e.g. a mix of PG & MySQL) except I didn't take a hard stance on this for LockTableType which does mix MySQL & Postgres bits.

None of this is a hill I will die on, but handling the burden of grammar differences in the parser and AST design means consuming the AST correctly in downstream projects will be easier.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of this is a hill I will die on, but handling the burden of grammar differences in the parser and AST design means consuming the AST correctly in downstream projects will be easier.

What I mean by this:

When pieces of dialect-specific syntax are mixed in the same AST struct/enum variant the consumers have to understand the dialect-specific differences in order to know which fields they can safely ignore.

At CipherStash I wrote a type-inferencer for SQL statements in order to determine if specific transformations can be performed safely. It uses sqlparser's AST and there are a lot of cases where I had to spend time understanding which AST node fields or combinations of field values I can safely ignore when I'm only targeting Postgres.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree that downstream crates targeting a single dialect would be easier to implement by essentially having dialect specific AST representations (on the other extreme there are downstream crates that would like to process the AST in a dialect agnostic manner,, we also have custom dialects in other downstream crates that need support). I think there are pros/cons to this approach vs the current one followed by the parser which puts some of the responsibility on the downstream crate. I'm thinking in any case ideally we would want to keep to the current approach for the PR while shift in approaches could be tackled as its own dedicated proposal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @iffyio that following the existing patterns in this crate is the best thing for this PR

If we were starting a new project having dialect specific AST structures would make sense to consider but at this point ensuring the code is consistent is more important in my opinion

#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
#[non_exhaustive]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[non_exhaustive]

I think we tend to not use this attribute, I think there are pros/cons with using it but better to keep with the existing convention in this PR

See: https://www.postgresql.org/docs/current/sql-lock.html

PG's full syntax for this statement is supported:

```
LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ]

where lockmode is one of:

    ACCESS SHARE | ROW SHARE | ROW EXCLUSIVE | SHARE UPDATE EXCLUSIVE
    | SHARE | SHARE ROW EXCLUSIVE | EXCLUSIVE | ACCESS EXCLUSIVE
```

MySQL and Postgres have support very different syntax for `LOCK TABLE`
and are implemented with a breaking change on the `Statement::LockTables
{ .. }` variant, turning the variant into one which accepts a
`LockTables` enum with variants for MySQL and Posgres.
@freshtonic freshtonic force-pushed the james/cip-1063-add-lock-table-support-in-sqlparser branch from cd9919b to d5cde4f Compare January 5, 2025 23:20
@alamb
Copy link
Contributor

alamb commented Jan 21, 2025

Marking as draft as I think this PR is no longer waiting on feedback. Please mark it as ready for review when it is ready for another look

@alamb alamb marked this pull request as draft January 21, 2025 09:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants