-
Notifications
You must be signed in to change notification settings - Fork 182
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(puffin): Parse Puffin FileMetadata #765
base: main
Are you sure you want to change the base?
Conversation
d4703f4
to
b9c4c6f
Compare
crates/puffin/src/lib.rs
Outdated
@@ -23,3 +23,8 @@ | |||
|
|||
mod compression; | |||
pub use compression::CompressionCodec; | |||
|
|||
mod metadata; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that I have deliberately chosen to not expose anything at the moment.
I want to create a separate, intentional PR to add pub
modifiers after the reader and writer implementations have been added.
This way, we can develop inside this crate without having to worry about breaking changes until we're ready.
If it helps though, I intend to expose the following outside of the crate eventually:
pub use metadata::{BlobMetadata, FileMetadata, CREATED_BY_PROPERTY};
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great!
4d3dd9f
to
011945e
Compare
@fqaiser94, just added the higher level statistic files in #799 FYI. I would guess you would end up building those soon too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @fqaiser94 for working on this! Here are some minor suggestions.
} | ||
|
||
#[derive(Clone, PartialEq, Eq, Hash, Debug)] | ||
pub(crate) enum Flag { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will we have other flags?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently the Puffin spec only supports one flag (as documented here) but allows for the possibility that there may be more flags in the future.
file_read: &dyn FileRead, | ||
input_file_length: u64, | ||
) -> Result<u32> { | ||
let start = input_file_length - u64::from(FileMetadata::FOOTER_STRUCT_LENGTH); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can use FileMetadata::FOOTER_STRUCT_LENGTH as u64
for simple
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can make this change if you insist but my understanding is that as
can be dangerous and for that reason I would prefer to avoid. The majority of the from
calls in here are safe/loss-less conversions but there is one that is not (where I use try_from
) and I like being able to see that clearly in the code. Also, the from
calls provide compile-time safety which is useful in case anyone decides to change the type of any of these constants in the future which could result in an unsafe/lossy cast if we use as
. Agree that is not as nice to read but personally, I'll take compile-time safety every time 😄
let mut flags = HashSet::new(); | ||
for byte_number in 0..FileMetadata::FOOTER_STRUCT_FLAGS_LENGTH { | ||
let byte_offset = footer_bytes.len() | ||
- usize::from(FileMetadata::MAGIC_LENGTH) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same, we can use FileMetadata::MAGIC_LENGTH as usize
c007376
to
c06c3d1
Compare
c06c3d1
to
aa321ee
Compare
Part of #744
Summary
Add support for parsing Puffin FileMetadata
Context
This is the third of a number of PRs to add support for Iceberg Puffin file format.
Feel free to ask clarifying questions if anything looks confusing/weird.
Also, it might be helpful to refer to the overarching PR from which these changes were split to understand better how these changes will fit in to the larger picture.
Useful links
Java Puffin binary files
Note the Java Puffin binary files are copied from https://github.com/apache/iceberg/tree/main/core/src/test/resources/org/apache/iceberg/puffin/v1