-
Notifications
You must be signed in to change notification settings - Fork 17
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, good job!
icelake/src/io/task_writer.rs
Outdated
.iter() | ||
.map(|field| { | ||
let array: ArrayRef = batch | ||
.column_by_name(&field.name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to search by name for each partition, this should happen in initialization of writer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite understand. Seems we can't do in initialization. For every batch coming in, we need to extract its related column to compute according partition field. And it's not gurantee that batch comes in is always have the same column order so we need to search it by name. (I'm not sure whether the function name is missleading.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should do it in initialization, and the column index should be found by schema. It's required that the record batch's schema should match table schema, otherwise the parquet file's schema doesn't match table schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I see. Yes we should do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, Thanks
We need to let all test case to use a same docker env. |
No description provided.