-
Notifications
You must be signed in to change notification settings - Fork 189
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: support metadata table "snapshots" #822
Conversation
186f075
to
e137fbb
Compare
e137fbb
to
b3cf119
Compare
crates/iceberg/src/metadata_scan.rs
Outdated
/// Returns the schema of the metadata table. | ||
fn schema() -> Schema; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to make this separate and public, so engines can get the schema first without fetching the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the metadata tables (e.g. entires
) have schemas that depend on the table schema. E.g. readable_metrics
is a struct with field per column. Might be worth having this take a scan
as well or do so in another PR.
Signed-off-by: xxchan <[email protected]>
8ca0ab6
to
973129e
Compare
crates/iceberg/src/metadata_scan.rs
Outdated
pub struct MetadataScan { | ||
metadata_ref: TableMetadataRef, | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering what you think about
pub struct MetadataScan { | |
metadata_ref: TableMetadataRef, | |
} | |
pub struct MetadataScan<'a> { | |
table: &'a Table, | |
} |
so we can get at FileIO
and other table properties that aren't in the metadata.
Signed-off-by: xxchan <[email protected]>
Signed-off-by: xxchan <[email protected]>
Signed-off-by: xxchan <[email protected]>
Signed-off-by: xxchan <[email protected]>
973129e
to
f40aa87
Compare
crates/iceberg/src/metadata_scan.rs
Outdated
fn schema() -> Schema; | ||
|
||
/// Scans the metadata table. | ||
fn scan(scan: &MetadataScan) -> Result<RecordBatch>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thoughts on making async because some of the metadata scans will do async reads of metadata files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for this, the manifests table requires an async call load_manifest_list()
with a dependency on FileIO
.
@rshkv Thanks for the review and the follow-up work. I just want to wait for the maintainers to make sure the framework looks good before applying any suggestions :p BTW I think we can also change the details like signatures and fields on demand later. |
Yeah, totally!
Yeah that's what I was worried about. Sg. 👍🏻 |
Signed-off-by: xxchan <[email protected]>
Signed-off-by: xxchan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, it looks much better now. Only some minor questions.
Signed-off-by: xxchan <[email protected]>
Thanks for the review. Updated. Please take a another look. |
Signed-off-by: xxchan <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is already in good shape. Let's merge it to unblock other pending PRs. Suggested changes can be made in a separate PR.
@@ -73,6 +73,7 @@ mod avro; | |||
pub mod io; | |||
pub mod spec; | |||
|
|||
pub mod metadata_scan; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be changed in to metadata_table
.
@@ -200,6 +201,12 @@ impl Table { | |||
TableScanBuilder::new(self) | |||
} | |||
|
|||
/// Creates a metadata table which provides table-like APIs for inspecting metadata. | |||
/// See [`MetadataTable`] for more details. | |||
pub fn metadata_table(self) -> MetadataTable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better to use &self
here to align with other APIs.
#823
In Iceberg Spark/Flink there are metadata tables that provide information around the table: https://iceberg.apache.org/docs/latest/spark-queries/#inspecting-tables
Supporting this in
iceberg-rust
allows other engines (like RisingWave) to support these "standard" metadata tables.In this PR, I added a trait for metadata tables and the first implementation for the
snapshots
metadata table. If it looks good, we can add more later.reference implementation:
- https://py.iceberg.apache.org/api/#inspecting-tables
- https://github.com/apache/iceberg-python/blob/main/pyiceberg/table/inspect.py#L58-L90