Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate from spansql to memefish #66

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

morikuni
Copy link

fix: #65

There are some DDLs that cannot be parsed by spansql, so I migrated to memefish while trying to maintain as much compatibility as possible. Due to differences in SQL formatting between spansql and memefish, there are significant diffs in the outputs. However, I believe the functionality remains unchanged.

spqnsqlではパースできないDDLがあるので、できるだけ互換性を保ったままmemefishに移行した。spansqlとmemefishのSQLのフォーマットの違いにより、diffに差分が大きく出ているが、動作は変わっていないと思う。

Copy link

@apstndb apstndb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is no wrong implementation.
memefish itself doesn't target usecase as SQL builder, so I think dynamic SQL is OK.
I expects there is some children nodes are not handled, but it will be handled in issues and following PRs.

@morikuni morikuni mentioned this pull request Dec 17, 2024
Copy link
Owner

@daichirata daichirata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Thank you for the changes to memefish. I just left comments on a few minor parts.

internal/hammer/diff.go Outdated Show resolved Hide resolved
internal/hammer/diff.go Outdated Show resolved Hide resolved
@morikuni morikuni requested a review from daichirata December 19, 2024 12:47
m[stmt.Name] = t
case *spansql.CreateIndex:
if t, ok := m[stmt.Table]; ok {
m[stmt.Name.SQL()] = t
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI
(*ast.Path).SQL() emits quoted identifier like

`group`.`table`

so it may be better to check len(path.Idents()) == 1 and add test cases with named schemas and names which is the same to keywords.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is expected after checking len(path.Idents()) == 1 ?
I think the keys of m := make(map[string]*Table) can include identifiers with named schemas, not just table names.

add test cases with named schemas and names which is the same to keywords.

I added test cases for named schema and table name that is keywords! (Although, CREATE SCHEME is not yet supported with hammer).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I want to say is that originally hammer did not support named schema,
but do you really want to support named schema in this memefish migration PR?
If it is broken, I recommend that you reject the general path for now and consider fixing it in a separate PR. If it is possible to do so, I think it is fine to do it in this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a result, named schema may be supported, but it only means that the identifier is now treated as an identifier, even if the identifier is just the table name or has a named schema.

I don't know if the original spansql.ID included named schema or not. However, I believe that the intention of the code here is not to use the table name, but to use the identifier. There may be differences in behaviour for named schema, but the intent seems to be consistent and backwards compatible.

We could intentionally remove support for named schema, but then we would have more code to remove named schema, including indexes and so on. IMO, I think that if the case of len(path.Idents()) == 3 were to appear in the future, we wouldn't have to exclude it here.

Anyway, I leave the final decision to @daichirata.

Copy link

@apstndb apstndb Dec 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI
m[stmt.Name.SQL()] can be m["`group`.`order`"] for CREATE TABLE `group`.`order` (group and order are keyword.).

But currently, GetDatabaseDdl will emit CREATE TABLE `group.order` for that table definition so the map will contain m["`group.order`"].

https://issuetracker.google.com/issues/385901554

I believe this inconsistency can break diffs using spanner: prefix of hammer.

I recommend to unquote path identifiers as likestrings.Join(path.Name.Idents[].Name, ".")(pseudo code).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, I can't understand your opinion that path.SQL() is suitable for map key than dot separated raw identifiers.

Quoted identifiers are only GoogleSQL lexical representation for reserved identifiers.
In other places like INFORMATION_SCHEMA, no quoted idenfires are used.

This reminds me of the slogan "サニタイズ言うな" (Don't say sanitize) that was popular in the Japanese engineering community in the past.
In this slogal says, we shouldn't store escaped representation, escaping (using .SQL()) should be performed only in last one mile.

However, it's clear that we won't be addressing named schema support in this PR. Regarding how to handle map keys, I'll defer to the owner's review on that.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the good point about named schema!

Therefore, there shouldn’t be any breaking changes until hammer supports the CREATE SCHEMA DDL, and we can wait for GetDatabaseDdl to be fixed before adding support for CREATE SCHEMA.

I agree with this opinion. Since the CREATE SCHEMA statement cannot currently be parsed, I believe this issue does not manifest for now.

Copy link
Author

@morikuni morikuni Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The key of this map is not required to be correct Cloud Spanner table name (or table identifier), but just an identifier of the table.
It means we can also use anything that is "same tables output the same, and different tables output different", like "<schema>.<table>", [2]string{"<schema>", "<table>"}, or even "<schema>🔧<table>".
So, at least me, I don't care how actually SQL() format the table names. I just expect SQL() outputs the same string for the same table (and ASTs generally satisfies that).

Copy link
Owner

@daichirata daichirata Dec 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with both sides on many points. But I also thought that even if it is actually useful, the better case is that the table identity is not affected by the SQL character representation.

If you were to write code on the hammer side that performs joins each time using a map with Idents as keys, would the scope of changes become significant? @morikuni

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pushed a commit to stop usingSQL() for comparison: 18a5bd3

Revert it if not needed.


require (
cloud.google.com/go/spanner v1.62.0
github.com/cloudspannerecosystem/memefish v0.0.0-20241219043423-1efca7ff9732
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is better to use released version

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate from spansql to memefish
3 participants