-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enum value parsing: do not parse by whitespace #15493
Enum value parsing: do not parse by whitespace #15493
Conversation
Signed-off-by: Shlomi Noach <[email protected]>
Review ChecklistHello reviewers! 👋 Please follow this checklist when reviewing this Pull Request. General
Tests
Documentation
New flags
If a workflow is added or modified:
Backward compatibility
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #15493 +/- ##
==========================================
- Coverage 67.41% 65.67% -1.75%
==========================================
Files 1560 1563 +3
Lines 192752 194476 +1724
==========================================
- Hits 129952 127714 -2238
- Misses 62800 66762 +3962 ☔ View full report in Codecov by Sentry. |
go/vt/schema/parser.go
Outdated
@@ -122,7 +121,7 @@ func parseEnumOrSetTokens(enumOrSetValues string) (tokens []string) { | |||
// input should not contain `enum(...)` column definition, just the comma delimited list | |||
return tokens | |||
} | |||
tokens = textutil.SplitDelimitedList(enumOrSetValues) | |||
tokens = strings.Split(enumOrSetValues, ",") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason(s) not to use the csv encoding packing in the stdlib? https://pkg.go.dev/encoding/csv
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main issue would be that quoting works differently there with "
and not '
like in SQL. Also escaping of quotes is different, so we'd have to write something specific for this or leverage instead sqlparser
as @shlomi-noach mentioned.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I did notice that the csv encoder simply removes any commas in the values which surprised me. I do think that we should take this opportunity to address this more fully though as MySQL happily accepts enum string values with commas:
mysql> create table t1 (id int not null auto_increment primary key, ev enum('hi there', 'with, comma'));
Query OK, 0 rows affected (0.02 sec)
mysql> show create table t1;
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| t1 | CREATE TABLE `t1` (
`id` int NOT NULL AUTO_INCREMENT,
`ev` enum('hi there','with, comma') DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci |
+-------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.01 sec)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, there's a bunch of things that still fail here. I agree we should fix this properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shlomi-noach I pushed up a little specific parser for this in f90e797 that deals also with quotes inside the values as well.
I also added some low level test there with stuff like newlines, I think we should also add those to the end-to-end test too to try all the exotic things possible here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realized my parser was not enough and also added a proper decoder for string values so we handle things line newlines correctly.
This handles cases like nested quotes etc. in the input. Signed-off-by: Dirkjan Bussink <[email protected]>
f90e797
to
53b09f7
Compare
Signed-off-by: Dirkjan Bussink <[email protected]>
@@ -333,35 +333,35 @@ func TestGetExpandedColumnNames(t *testing.T) { | |||
"expand enum", | |||
Column{ | |||
Type: EnumColumnType, | |||
EnumValues: "'a', 'b'", | |||
EnumValues: "'a','b'", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shlomi-noach it looks like MySQL in neither information_schema.columns
or when using SHOW CREATE TABLE
includes a space after the ,
. The new parser doesn't accept that, do you know if we depend anywhere outside of tests on allowing that space?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing depends on existence of spaces. Part of the fix here is to actually assume they never exist, as you say.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional changes in BufDecodeStringSQL
and parseEnumOrSetTokens
look good to me.
Proposed followup in #17133 |
Description
Addressing #15492, this is an initial simple fix that allows whitespace in
enum
value. A more formal fix to follow, as this fix is incomplete in that it still requires more escaping.A complete fix is likely to benefit from some
sqlparser
capabilities, though the use case goes beyond parsingCREATE TABLE
statements and thus we might still take a lesser-than-yacc
approach.Related Issue(s)
#15492
Checklist
Deployment Notes