Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Test with PlanetScale #22

Closed
1 task
edgarrmondragon opened this issue Jan 30, 2024 · 10 comments · Fixed by #45
Closed
1 task

Test with PlanetScale #22

edgarrmondragon opened this issue Jan 30, 2024 · 10 comments · Fixed by #45

Comments

@edgarrmondragon
Copy link
Member

edgarrmondragon commented Jan 30, 2024

Add integration tests against a PlanetScale instance

Tasks

Preview Give feedback

Refs:

@edgarrmondragon
Copy link
Member Author

PlanetScale requires SSL authentication

@edgarrmondragon
Copy link
Member Author

Few other issues:

pymysql.err.OperationalError: (1105, "VT05003: unknown database 'information_schema' in vschema")
sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (1105, "VT05003: unknown database 'information_schema' in vschema")
[SQL: DESCRIBE `information_schema`.`ADMINISTRABLE_ROLE_AUTHORIZATIONS`]

@BTheunissen
Copy link

I managed to get Planetscale working a few months ago, with the following patch to the tap to ignore reserved schemas.

def discover_catalog_entries(self) -> list[dict]:
        """Return a list of catalog entries from discovery.

        Returns:
            The discovered catalog entries as a list.
        """
        result: list[dict] = []
        engine = self.connector._engine
        inspected = sqlalchemy.inspect(engine)
        for schema_name in self.connector.get_schema_names(engine, inspected):
            # Do not include system tables.
            if schema_name in ["information_schema", "sys", "performance_schema", "mysql"]:
                continue
            # Iterate through each table and view
            for table_name, is_view in self.connector.get_object_names(
                engine,
                inspected,
                schema_name,
            ):
                catalog_entry = self.connector.discover_catalog_entry(
                    engine,
                    inspected,
                    schema_name,
                    table_name,
                    is_view,
                )
                result.append(catalog_entry.to_dict())

        return result

One issue I am hitting with Planetscale on larger table extraction is the 100k row limit for queries, such as discussed here, planetscale/discussion#190.

It was my understanding that the SQLStream implementation should be incrementally fetching with SQLAlchemy, but it doesn't seem like Planetscale interprets it as a batched query.

@visch
Copy link
Member

visch commented Feb 17, 2024

@BTheunissen

Great information on the 100k rows issue! I'll try the olap solution given here planetscale/discussion#388 and see what happens. If I get this running could you give it a shot on your end to see if it works? I'll be in touch (I hope!)

Filtering out those schemas does work, but it works just because views themselves just don't work. I'm trying to make views work as I know the use case is pretty important for a lot of DB users.

@visch
Copy link
Member

visch commented Feb 17, 2024

@BTheunissen #45 works now, if you could give this a shot and let me know how it goes for you!

Note that the 100k rows issue is solved, views should work, and the readme shows that filter_schemas is a config option you can use

@BTheunissen
Copy link

Amazing work, I can get this tested and get back to you :)

@BTheunissen
Copy link

@visch I haven't had to use MySQL views just yet but I have for quite a few other connectors/customers, so being able to support views will save me some potential future pain :)

@visch
Copy link
Member

visch commented Feb 19, 2024

@BTheunissen did this work for you!?

@dbussink
Copy link

FYI, the official singer tap for PlanetScale is located at https://github.com/planetscale/singer-tap/

This also supports things like incremental syncs. The binlogs on PlanetScale are not directly accessible so those can’t be used in the same way as in MySQL for incremental syncs.

@visch
Copy link
Member

visch commented Feb 21, 2024

Thanks @dbussink , I added a link to the README here.

Also I think we should get Planet Scales tap added to the hub, I made an issue here meltano/hub#1684

I was trying to get this working on my side but I hit this issue https://github.com/planetscale/singer-tap/issues/51 looks like the schema isn't following the singer spec?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants