-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Migrate from spansql to memefish #66
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is no wrong implementation.
memefish itself doesn't target usecase as SQL builder, so I think dynamic SQL is OK.
I expects there is some children nodes are not handled, but it will be handled in issues and following PRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Thank you for the changes to memefish. I just left comments on a few minor parts.
internal/hammer/diff.go
Outdated
m[stmt.Name] = t | ||
case *spansql.CreateIndex: | ||
if t, ok := m[stmt.Table]; ok { | ||
m[stmt.Name.SQL()] = t |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI
(*ast.Path).SQL()
emits quoted identifier like
`group`.`table`
so it may be better to check len(path.Idents()) == 1
and add test cases with named schemas and names which is the same to keywords.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is expected after checking len(path.Idents()) == 1
?
I think the keys of m := make(map[string]*Table)
can include identifiers with named schemas, not just table names.
add test cases with named schemas and names which is the same to keywords.
I added test cases for named schema and table name that is keywords! (Although, CREATE SCHEME
is not yet supported with hammer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I want to say is that originally hammer did not support named schema,
but do you really want to support named schema in this memefish migration PR?
If it is broken, I recommend that you reject the general path for now and consider fixing it in a separate PR. If it is possible to do so, I think it is fine to do it in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As a result, named schema may be supported, but it only means that the identifier is now treated as an identifier, even if the identifier is just the table name or has a named schema.
I don't know if the original spansql.ID
included named schema or not. However, I believe that the intention of the code here is not to use the table name, but to use the identifier. There may be differences in behaviour for named schema, but the intent seems to be consistent and backwards compatible.
We could intentionally remove support for named schema, but then we would have more code to remove named schema, including indexes and so on. IMO, I think that if the case of len(path.Idents()) == 3
were to appear in the future, we wouldn't have to exclude it here.
Anyway, I leave the final decision to @daichirata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI
m[stmt.Name.SQL()]
can be m["`group`.`order`"]
for CREATE TABLE `group`.`order`
(group
and order
are keyword.).
But currently, GetDatabaseDdl
will emit CREATE TABLE `group.order`
for that table definition so the map will contain m["`group.order`"]
.
https://issuetracker.google.com/issues/385901554
I believe this inconsistency can break diffs using spanner:
prefix of hammer.
I recommend to unquote path identifiers as likestrings.Join(path.Name.Idents[].Name, ".")
(pseudo code).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, I can't understand your opinion that path.SQL()
is suitable for map key than dot separated raw identifiers.
Quoted identifiers are only GoogleSQL lexical representation for reserved identifiers.
In other places like INFORMATION_SCHEMA
, no quoted idenfires are used.
This reminds me of the slogan "サニタイズ言うな" (Don't say sanitize) that was popular in the Japanese engineering community in the past.
In this slogal says, we shouldn't store escaped representation, escaping (using .SQL()
) should be performed only in last one mile.
However, it's clear that we won't be addressing named schema support in this PR. Regarding how to handle map keys, I'll defer to the owner's review on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the good point about named schema!
Therefore, there shouldn’t be any breaking changes until hammer supports the CREATE SCHEMA DDL, and we can wait for GetDatabaseDdl to be fixed before adding support for CREATE SCHEMA.
I agree with this opinion. Since the CREATE SCHEMA statement cannot currently be parsed, I believe this issue does not manifest for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The key of this map is not required to be correct Cloud Spanner table name (or table identifier), but just an identifier of the table.
It means we can also use anything that is "same tables output the same, and different tables output different", like "<schema>.<table>"
, [2]string{"<schema>", "<table>"}
, or even "<schema>🔧<table>"
.
So, at least me, I don't care how actually SQL()
format the table names. I just expect SQL()
outputs the same string for the same table (and ASTs generally satisfies that).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with both sides on many points. But I also thought that even if it is actually useful, the better case is that the table identity is not affected by the SQL character representation.
If you were to write code on the hammer side that performs joins each time using a map with Idents as keys, would the scope of changes become significant? @morikuni
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pushed a commit to stop usingSQL()
for comparison: 18a5bd3
Revert it if not needed.
|
||
require ( | ||
cloud.google.com/go/spanner v1.62.0 | ||
github.com/cloudspannerecosystem/memefish v0.0.0-20241219043423-1efca7ff9732 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it is better to use released version
fix: #65
There are some DDLs that cannot be parsed by spansql, so I migrated to memefish while trying to maintain as much compatibility as possible. Due to differences in SQL formatting between spansql and memefish, there are significant diffs in the outputs. However, I believe the functionality remains unchanged.
spqnsqlではパースできないDDLがあるので、できるだけ互換性を保ったままmemefishに移行した。spansqlとmemefishのSQLのフォーマットの違いにより、diffに差分が大きく出ているが、動作は変わっていないと思う。