Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Supply a hint arrow schema for casting Parquet field types during scans #814

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

gruuya
Copy link
Contributor

@gruuya gruuya commented Dec 17, 2024

This is so as to avoid a potential schema mismatch resulting from upcasting arrow 8 and 16 bit integers to Iceberg 32 bit integer type.

This is one way to resolve #813. Note that this is dependent on apache/arrow-rs#6892 getting merged (and picked up) first.

I still need to think of a proper test case for this too.

Closes #813.

This is so as to avoid a potential schema mismatch resulting from upcasting arrow 8 and 16 bit integers to Iceberg 32 bit integer type.
Comment on lines +199 to +204
if task.schema.as_struct().fields().iter().any(|field| {
matches!(
field.field_type.as_ref(),
Type::Primitive(PrimitiveType::Int)
)
}) {
Copy link
Contributor Author

@gruuya gruuya Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this should be done only if the field with this type is also one of the projected ones.

@gruuya
Copy link
Contributor Author

gruuya commented Dec 20, 2024

Making this a draft as the upstream dependency is also a draft atm.

@gruuya gruuya marked this pull request as draft December 20, 2024 08:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reported and actual arrow schema of the table can be different
1 participant