Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add s3tables catalog #807

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

flaneur2020
Copy link

@flaneur2020 flaneur2020 commented Dec 16, 2024

fixes #754

this PR adds the implementation of s3tables catalog. i've tested the CRUD of namespaces/tables in my local laptop with a real s3tables bucket.

need to add a mocked test suites in ci in another pr.

@flaneur2020 flaneur2020 marked this pull request as draft December 16, 2024 14:54
@Xuanwo
Copy link
Member

Xuanwo commented Dec 20, 2024

Hi @flaneur2020, I suggest splitting this PR into multiple ones to make it easier to review and accelerate the iteration speed.

@flaneur2020
Copy link
Author

flaneur2020 commented Dec 20, 2024

@Xuanwo i believe the missing part of this pr is adding tests, i've created a real s3tables bucket to test it and it looks work fine, can you give some suggestions about the test part?

i found the glue catalog is using a mock service from moto, but the mock service for s3tables still not available in moto yet.

@flaneur2020 flaneur2020 marked this pull request as ready for review December 20, 2024 10:04
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this PR, really great! Only some small suggestions.

use crate::utils::{create_metadata_location, create_sdk_config};

#[derive(Debug)]
pub struct S3TablesCatalogConfig {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comments for all public structs.

}

impl S3TablesCatalog {
pub async fn new(config: S3TablesCatalogConfig) -> Result<Self> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same, please add comments for all public APIs. Better to have an example if it's simple.


#[derive(Debug)]
pub struct S3TablesCatalogConfig {
table_bucket_arn: String,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, it's a bit confused for me to have a hard require for table_bucket_arn for the first look. Would you like to add a comment here to explain that all operations need table_bucket_arn instead of bucket?

Copy link
Author

@flaneur2020 flaneur2020 Dec 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s3 tables bucket is very strange, no file is stored in path under s3://{s3table_bucket}/, but every table has a special "bucket name" like s3://{xxxxxxxx}_s3_table, when we access the files, it's always under th paths like s3://{xxxxxxxx}_s3_table. in the admin console of s3, you can not browse any of the files in this s3table bucket either.

in my understanding, this ARN is the identifier of the the abstract s3tables bucket, and it's used everywhere in the s3tables sdk, the plain bucket path is almost useless to us.

let me add it in the comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add s3tables catalog support
2 participants