Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse Postgres's LOCK TABLE statement #1614

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
182 changes: 167 additions & 15 deletions src/ast/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3341,16 +3341,13 @@ pub enum Statement {
value: Option<Value>,
is_eq: bool,
},
/// ```sql
/// LOCK TABLES <table_name> [READ [LOCAL] | [LOW_PRIORITY] WRITE]
/// ```
/// Note: this is a MySQL-specific statement. See <https://dev.mysql.com/doc/refman/8.0/en/lock-tables.html>
LockTables { tables: Vec<LockTable> },
/// See [`LockTables`].
LockTables(LockTables),
/// ```sql
/// UNLOCK TABLES
/// ```
/// Note: this is a MySQL-specific statement. See <https://dev.mysql.com/doc/refman/8.0/en/lock-tables.html>
UnlockTables,
UnlockTables(bool),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
UnlockTables(bool),
UnlockTables(UnlockTables),

maybe we use a dedicated struct here as well? Its not clear what the bool property implies otherwise

/// ```sql
/// UNLOAD(statement) TO <destination> [ WITH options ]
/// ```
Expand Down Expand Up @@ -4925,11 +4922,15 @@ impl fmt::Display for Statement {
}
Ok(())
}
Statement::LockTables { tables } => {
write!(f, "LOCK TABLES {}", display_comma_separated(tables))
Statement::LockTables(lock_tables) => {
write!(f, "{}", lock_tables)
}
Statement::UnlockTables => {
write!(f, "UNLOCK TABLES")
Statement::UnlockTables(pluralized) => {
if *pluralized {
write!(f, "UNLOCK TABLES")
} else {
write!(f, "UNLOCK TABLE")
}
}
Statement::Unload { query, to, with } => {
write!(f, "UNLOAD({query}) TO {to}")?;
Expand Down Expand Up @@ -7278,16 +7279,126 @@ impl fmt::Display for SearchModifier {
}
}

/// A `LOCK TABLE ..` statement. MySQL and Postgres variants are supported.
///
/// The MySQL and Postgres syntax variants are significant enough that they
/// are explicitly represented as enum variants. In order to support additional
/// databases in the future, this enum is marked as `#[non_exhaustive]`.
///
/// In MySQL, when multiple tables are mentioned in the statement the lock mode
/// can vary per table.
///
/// In contrast, Postgres only allows specifying a single mode which is applied
/// to all mentioned tables.
///
/// MySQL: see <https://dev.mysql.com/doc/refman/8.0/en/lock-tables.html>
///
/// ```sql
/// LOCK [TABLE | TABLES] name [[AS] alias] locktype [,name [[AS] alias] locktype]
/// ````
///
/// Where *locktype* is:
/// ```sql
/// READ [LOCAL] | [LOW_PRIORITY] WRITE
/// ```
///
/// Postgres: See <https://www.postgresql.org/docs/current/sql-lock.html>
///
/// ```sql
/// LOCK [ TABLE ] [ ONLY ] name [, ...] [ IN lockmode MODE ] [ NOWAIT ]
/// ```
/// Where *lockmode* is one of:
///
/// ```sql
/// ACCESS SHARE | ROW SHARE | ROW EXCLUSIVE | SHARE UPDATE EXCLUSIVE
/// | SHARE | SHARE ROW EXCLUSIVE | EXCLUSIVE | ACCESS EXCLUSIVE
/// ``````
#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
pub struct LockTable {
pub table: Ident,
#[non_exhaustive]
pub enum LockTables {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to avoid variants that are specific to dialects, those tend to make it more difficult to reuse the parser code and ast representations across dialects. Representation wise, I think both variants can be merged into a struct with something like the following?

struct TableLock {
  pub table: ObjectName,
  pub alias: Option<Ident>,
  pub lock_type: Option<LockTableType>,
}
struct LockTable {
  pluralized_table_keyword: Option<bool>, // If None, then no table keyword was provided
  locks: Vec<TableLock>,
  lock_mode: Option<LockTableType>,
  only: bool,
  no_wait: bool
}

similarly, the parser uses the same impl to create the struct

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want to avoid variants that are specific to dialects, those tend to make it more difficult to reuse the parser code and ast representations across dialects

MySQL's & Postgres's LOCK statements have minimal overlap. They are similar in name only.

  • MySQL allows different lock modes per table vs Postgres one lock mode applied to all tables
  • MySQL's lock modes and Postgres's lock modes do not overlap at all
  • MySQL has freely interchangeable table keywords, one of which MUST be present: TABLE or TABLES
  • Postgres has one TABLE keyword but it's optional
  • Postgres supports additional (optional) ONLY and NOWAIT keywords
  • It is never valid to mix and match the Postgres-specific syntax with MySQL-specific syntax.

I agree that explicit database-specific AST fragments make parser reuse more difficult but in the case of LOCK pretty much nothing is reusable.

The db-specific AST pieces do make implementing Display a lot less error prone.

It also makes it (almost) impossible be able to represent invalid AST (e.g. a mix of PG & MySQL) except I didn't take a hard stance on this for LockTableType which does mix MySQL & Postgres bits.

None of this is a hill I will die on, but handling the burden of grammar differences in the parser and AST design means consuming the AST correctly in downstream projects will be easier.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of this is a hill I will die on, but handling the burden of grammar differences in the parser and AST design means consuming the AST correctly in downstream projects will be easier.

What I mean by this:

When pieces of dialect-specific syntax are mixed in the same AST struct/enum variant the consumers have to understand the dialect-specific differences in order to know which fields they can safely ignore.

At CipherStash I wrote a type-inferencer for SQL statements in order to determine if specific transformations can be performed safely. It uses sqlparser's AST and there are a lot of cases where I had to spend time understanding which AST node fields or combinations of field values I can safely ignore when I'm only targeting Postgres.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree that downstream crates targeting a single dialect would be easier to implement by essentially having dialect specific AST representations (on the other extreme there are downstream crates that would like to process the AST in a dialect agnostic manner,, we also have custom dialects in other downstream crates that need support). I think there are pros/cons to this approach vs the current one followed by the parser which puts some of the responsibility on the downstream crate. I'm thinking in any case ideally we would want to keep to the current approach for the PR while shift in approaches could be tackled as its own dedicated proposal.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @iffyio that following the existing patterns in this crate is the best thing for this PR

If we were starting a new project having dialect specific AST structures would make sense to consider but at this point ensuring the code is consistent is more important in my opinion

/// The MySQL syntax variant
MySql {
/// Whether the `TABLE` or `TABLES` keyword was used.
pluralized_table_keyword: bool,
/// The tables to lock and their per-table lock mode.
tables: Vec<MySqlTableLock>,
},

/// The Postgres syntax variant.
Postgres {
/// One or more optionally schema-qualified table names to lock.
tables: Vec<ObjectName>,
/// The lock type applied to all mentioned tables.
lock_mode: Option<LockTableType>,
/// Whether the optional `TABLE` keyword was present (to support round-trip parse & render)
has_table_keyword: bool,
/// Whether the `ONLY` option was specified.
only: bool,
/// Whether the `NOWAIT` option was specified.
no_wait: bool,
},
}

impl Display for LockTables {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
match self {
LockTables::MySql {
pluralized_table_keyword,
tables,
} => {
write!(
f,
"LOCK {tbl_kwd} ",
tbl_kwd = if *pluralized_table_keyword {
"TABLES"
} else {
"TABLE"
}
)?;
write!(f, "{}", display_comma_separated(tables))?;
Ok(())
}
LockTables::Postgres {
tables,
lock_mode,
has_table_keyword,
only,
no_wait,
} => {
write!(
f,
"LOCK{tbl_kwd}",
tbl_kwd = if *has_table_keyword { " TABLE" } else { "" }
)?;
if *only {
write!(f, " ONLY")?;
}
write!(f, " {}", display_comma_separated(tables))?;
if let Some(lock_mode) = lock_mode {
write!(f, " IN {} MODE", lock_mode)?;
}
if *no_wait {
write!(f, " NOWAIT")?;
}
Ok(())
}
}
}
}

/// A locked table from a MySQL `LOCK TABLE` statement.
#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
pub struct MySqlTableLock {
pub table: ObjectName,
pub alias: Option<Ident>,
pub lock_type: LockTableType,
pub lock_type: Option<LockTableType>,
}

impl fmt::Display for LockTable {
impl fmt::Display for MySqlTableLock {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
let Self {
table: tbl_name,
Expand All @@ -7299,17 +7410,34 @@ impl fmt::Display for LockTable {
if let Some(alias) = alias {
write!(f, "AS {alias} ")?;
}
write!(f, "{lock_type}")?;
if let Some(lock_type) = lock_type {
write!(f, "{lock_type}")?;
}
Ok(())
}
}

/// Table lock types.
///
/// `Read` & `Write` are MySQL-specfic.
///
/// AccessShare, RowShare, RowExclusive, ShareUpdateExclusive, Share,
/// ShareRowExclusive, Exclusive, AccessExclusive are Postgres-specific.
#[derive(Debug, Clone, PartialEq, PartialOrd, Eq, Ord, Hash)]
#[cfg_attr(feature = "serde", derive(Serialize, Deserialize))]
#[cfg_attr(feature = "visitor", derive(Visit, VisitMut))]
#[non_exhaustive]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[non_exhaustive]

I think we tend to not use this attribute, I think there are pros/cons with using it but better to keep with the existing convention in this PR

pub enum LockTableType {
Read { local: bool },
Write { low_priority: bool },
AccessShare,
RowShare,
RowExclusive,
ShareUpdateExclusive,
Share,
ShareRowExclusive,
Exclusive,
AccessExclusive,
}

impl fmt::Display for LockTableType {
Expand All @@ -7327,6 +7455,30 @@ impl fmt::Display for LockTableType {
}
write!(f, "WRITE")?;
}
Self::AccessShare => {
write!(f, "ACCESS SHARE")?;
}
Self::RowShare => {
write!(f, "ROW SHARE")?;
}
Self::RowExclusive => {
write!(f, "ROW EXCLUSIVE")?;
}
Self::ShareUpdateExclusive => {
write!(f, "SHARE UPDATE EXCLUSIVE")?;
}
Self::Share => {
write!(f, "SHARE")?;
}
Self::ShareRowExclusive => {
write!(f, "SHARE ROW EXCLUSIVE")?;
}
Self::Exclusive => {
write!(f, "EXCLUSIVE")?;
}
Self::AccessExclusive => {
write!(f, "ACCESS EXCLUSIVE")?;
}
}

Ok(())
Expand Down
2 changes: 1 addition & 1 deletion src/ast/spans.rs
Original file line number Diff line number Diff line change
Expand Up @@ -479,7 +479,7 @@ impl Spanned for Statement {
Statement::CreateType { .. } => Span::empty(),
Statement::Pragma { .. } => Span::empty(),
Statement::LockTables { .. } => Span::empty(),
Statement::UnlockTables => Span::empty(),
Statement::UnlockTables(_) => Span::empty(),
Statement::Unload { .. } => Span::empty(),
Statement::OptimizeTable { .. } => Span::empty(),
Statement::CreatePolicy { .. } => Span::empty(),
Expand Down
37 changes: 25 additions & 12 deletions src/dialect/mysql.rs
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@
use alloc::boxed::Box;

use crate::{
ast::{BinaryOperator, Expr, LockTable, LockTableType, Statement},
ast::{BinaryOperator, Expr, LockTableType, LockTables, MySqlTableLock, Statement},
dialect::Dialect,
keywords::Keyword,
parser::{Parser, ParserError},
Expand Down Expand Up @@ -81,10 +81,14 @@ impl Dialect for MySqlDialect {
}

fn parse_statement(&self, parser: &mut Parser) -> Option<Result<Statement, ParserError>> {
if parser.parse_keywords(&[Keyword::LOCK, Keyword::TABLES]) {
Some(parse_lock_tables(parser))
if parser.parse_keywords(&[Keyword::LOCK, Keyword::TABLE]) {
Some(parse_lock_tables(parser, false))
} else if parser.parse_keywords(&[Keyword::LOCK, Keyword::TABLES]) {
Some(parse_lock_tables(parser, true))
} else if parser.parse_keywords(&[Keyword::UNLOCK, Keyword::TABLE]) {
Some(parse_unlock_tables(parser, false))
} else if parser.parse_keywords(&[Keyword::UNLOCK, Keyword::TABLES]) {
Some(parse_unlock_tables(parser))
Some(parse_unlock_tables(parser, true))
} else {
None
}
Expand All @@ -106,22 +110,28 @@ impl Dialect for MySqlDialect {

/// `LOCK TABLES`
/// <https://dev.mysql.com/doc/refman/8.0/en/lock-tables.html>
fn parse_lock_tables(parser: &mut Parser) -> Result<Statement, ParserError> {
fn parse_lock_tables(
parser: &mut Parser,
pluralized_table_keyword: bool,
) -> Result<Statement, ParserError> {
let tables = parser.parse_comma_separated(parse_lock_table)?;
Ok(Statement::LockTables { tables })
Ok(Statement::LockTables(LockTables::MySql {
pluralized_table_keyword,
tables,
}))
}

// tbl_name [[AS] alias] lock_type
fn parse_lock_table(parser: &mut Parser) -> Result<LockTable, ParserError> {
let table = parser.parse_identifier()?;
fn parse_lock_table(parser: &mut Parser) -> Result<MySqlTableLock, ParserError> {
let table = parser.parse_object_name(false)?;
let alias =
parser.parse_optional_alias(&[Keyword::READ, Keyword::WRITE, Keyword::LOW_PRIORITY])?;
let lock_type = parse_lock_tables_type(parser)?;

Ok(LockTable {
Ok(MySqlTableLock {
table,
alias,
lock_type,
lock_type: Some(lock_type),
})
}

Expand All @@ -146,6 +156,9 @@ fn parse_lock_tables_type(parser: &mut Parser) -> Result<LockTableType, ParserEr

/// UNLOCK TABLES
/// <https://dev.mysql.com/doc/refman/8.0/en/lock-tables.html>
fn parse_unlock_tables(_parser: &mut Parser) -> Result<Statement, ParserError> {
Ok(Statement::UnlockTables)
fn parse_unlock_tables(
_parser: &mut Parser,
pluralized_table: bool,
) -> Result<Statement, ParserError> {
Ok(Statement::UnlockTables(pluralized_table))
}
56 changes: 55 additions & 1 deletion src/dialect/postgresql.rs
Original file line number Diff line number Diff line change
Expand Up @@ -28,12 +28,15 @@
// limitations under the License.
use log::debug;

use crate::ast::{ObjectName, Statement, UserDefinedTypeRepresentation};
use crate::ast::{LockTableType, LockTables, ObjectName, Statement, UserDefinedTypeRepresentation};
use crate::dialect::{Dialect, Precedence};
use crate::keywords::Keyword;
use crate::parser::{Parser, ParserError};
use crate::tokenizer::Token;

#[cfg(not(feature = "std"))]
use alloc::vec::Vec;

/// A [`Dialect`] for [PostgreSQL](https://www.postgresql.org/)
#[derive(Debug)]
pub struct PostgreSqlDialect {}
Expand Down Expand Up @@ -139,6 +142,9 @@ impl Dialect for PostgreSqlDialect {
if parser.parse_keyword(Keyword::CREATE) {
parser.prev_token(); // unconsume the CREATE in case we don't end up parsing anything
parse_create(parser)
} else if parser.parse_keyword(Keyword::LOCK) {
parser.prev_token(); // unconsume the LOCK in case we don't end up parsing anything
Some(parse_lock_table(parser))
} else {
None
}
Expand Down Expand Up @@ -276,3 +282,51 @@ pub fn parse_create_type_as_enum(
representation: UserDefinedTypeRepresentation::Enum { labels },
})
}

pub fn parse_lock_table(parser: &mut Parser) -> Result<Statement, ParserError> {
parser.expect_keyword(Keyword::LOCK)?;
let has_table_keyword = parser.parse_keyword(Keyword::TABLE);
let only = parser.parse_keyword(Keyword::ONLY);
let tables: Vec<ObjectName> =
parser.parse_comma_separated(|parser| parser.parse_object_name(false))?;
let lock_mode = parse_lock_mode(parser)?;
let no_wait = parser.parse_keyword(Keyword::NOWAIT);

Ok(Statement::LockTables(LockTables::Postgres {
tables,
lock_mode,
has_table_keyword,
only,
no_wait,
}))
}

pub fn parse_lock_mode(parser: &mut Parser) -> Result<Option<LockTableType>, ParserError> {
if !parser.parse_keyword(Keyword::IN) {
return Ok(None);
}

let lock_mode = if parser.parse_keywords(&[Keyword::ACCESS, Keyword::SHARE]) {
LockTableType::AccessShare
} else if parser.parse_keywords(&[Keyword::ACCESS, Keyword::EXCLUSIVE]) {
LockTableType::AccessExclusive
} else if parser.parse_keywords(&[Keyword::EXCLUSIVE]) {
LockTableType::Exclusive
} else if parser.parse_keywords(&[Keyword::ROW, Keyword::EXCLUSIVE]) {
LockTableType::RowExclusive
} else if parser.parse_keywords(&[Keyword::ROW, Keyword::SHARE]) {
LockTableType::RowShare
} else if parser.parse_keywords(&[Keyword::SHARE, Keyword::ROW, Keyword::EXCLUSIVE]) {
LockTableType::ShareRowExclusive
} else if parser.parse_keywords(&[Keyword::SHARE, Keyword::UPDATE, Keyword::EXCLUSIVE]) {
LockTableType::ShareUpdateExclusive
} else if parser.parse_keywords(&[Keyword::SHARE]) {
LockTableType::Share
} else {
return Err(ParserError::ParserError("Expected: ACCESS EXCLUSIVE | ACCESS SHARE | EXCLUSIVE | ROW EXCLUSIVE | ROW SHARE | SHARE | SHARE ROW EXCLUSIVE | SHARE ROW EXCLUSIVE".into()));
};

parser.expect_keyword(Keyword::MODE)?;

Ok(Some(lock_mode))
}
Loading
Loading