Skip to content
This repository has been archived by the owner on Aug 13, 2024. It is now read-only.

feat: support partition writer #153

Merged
merged 3 commits into from
Aug 18, 2023
Merged

Conversation

ZENOTME
Copy link
Contributor

@ZENOTME ZENOTME commented Aug 17, 2023

No description provided.

Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, good job!

icelake/src/io/task_writer.rs Outdated Show resolved Hide resolved
.iter()
.map(|field| {
let array: ArrayRef = batch
.column_by_name(&field.name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to search by name for each partition, this should happen in initialization of writer.

Copy link
Contributor Author

@ZENOTME ZENOTME Aug 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite understand. Seems we can't do in initialization. For every batch coming in, we need to extract its related column to compute according partition field. And it's not gurantee that batch comes in is always have the same column order so we need to search it by name. (I'm not sure whether the function name is missleading.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should do it in initialization, and the column index should be found by schema. It's required that the record batch's schema should match table schema, otherwise the parquet file's schema doesn't match table schema.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. Yes we should do it.

icelake/src/io/task_writer.rs Show resolved Hide resolved
icelake/src/io/task_writer.rs Show resolved Hide resolved
tests/integration/rust/src/append.rs Show resolved Hide resolved
Copy link
Contributor

@liurenjie1024 liurenjie1024 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Thanks

@ZENOTME
Copy link
Contributor Author

ZENOTME commented Aug 18, 2023

We need to let all test case to use a same docker env.

@liurenjie1024 liurenjie1024 merged commit 393d000 into icelake-io:main Aug 18, 2023
3 checks passed
@ZENOTME ZENOTME deleted the partition branch August 18, 2023 11:46
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants