-
Notifications
You must be signed in to change notification settings - Fork 172
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Implement custom RecordBatch serde for shuffle for improved performance #1190
base: main
Are you sure you want to change the base?
Conversation
49d0c27
to
f7d8cce
Compare
use std::io::Write; | ||
use std::sync::Arc; | ||
|
||
pub fn write_batch_fast( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you going to end up implementing a form of arrow (stream) IPC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discovered that we may be able to just use https://docs.rs/arrow-ipc/latest/arrow_ipc/writer/struct.IpcDataGenerator.html#method.encoded_batch and am going to look into that next
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you going to end up implementing a form of arrow (stream) IPC?
Yes, but without using flatbuffers to align and encode anything, just the raw bytes, and without the metadata messages.
native/core/benches/batch_serde.rs
Outdated
|
||
fn create_batch() -> RecordBatch { | ||
let schema = Arc::new(Schema::new(vec![ | ||
Field::new("c0", DataType::Utf8, true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @andygrove interesting if other datatypes keep the same performance benefit
e32db8d
to
c2b9e5c
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #1190 +/- ##
============================================
- Coverage 34.69% 34.67% -0.02%
+ Complexity 991 987 -4
============================================
Files 116 116
Lines 44891 44883 -8
Branches 9864 9884 +20
============================================
- Hits 15574 15565 -9
+ Misses 26168 26155 -13
- Partials 3149 3163 +14 ☔ View full report in Codecov by Sentry. |
96a87bd
to
b9b3d9d
Compare
Thanks @andygrove thats amazing, I suppose it is extensible to support nested types in future? |
Yes, I don't foresee any issues with that. |
FYI I filed a ticket in DataFusion to consider adding this code (or something similar) upstream: I think it would help other serialization usecases (not just the shuffle reader/writer) -- for example sorting and grouping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @andygrove
for compression_codec in [ | ||
CompressionCodec::None, | ||
CompressionCodec::Lz4Frame, | ||
CompressionCodec::Zstd(1), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit/optional: snappy and zstd(6) here as well?
let null_buffer = self.read_null_buffer(); | ||
Arc::new(BinaryArray::try_new(offset_buffer, buffer, null_buffer)?) | ||
} | ||
DataType::Dictionary(_, _) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be safer, maybe add Int32Type
to the key type for this pattern.
Co-authored-by: Liang-Chi Hsieh <[email protected]>
FWIW this encoding format is almost identical to the IPC format AFAICT with only some minor changes to the metadata encoding. The exact same validation is done when reading an array, and both preserve the buffers as is1.. I suspect that much of the overhead is fixed overheads in StreamWriter, e.g. encoding the schema, and that these could be optimised and/or eliminated by using the lower-level APIs such as write_message and root_as_message. The benchmarks at least appear to agree with this, with a relatively fixed performance delta on the order of 10s of microseconds between the two encoders. Just thought I'd put it out there, there is likely low hanging fruit in the arrow-rs IPC implementation and improvements there would definitely be welcome not just by myself. 1. This is actually not entirely true, when dealing with sliced arrays, the arrow IPC implementation has additional logic to avoid encoding data that isn't referenced |
DataType::Utf8 => { | ||
let arr = col.as_any().downcast_ref::<StringArray>().unwrap(); | ||
// write data buffer | ||
self.write_buffer(arr.values())?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI if the array is sliced this will write the full data buffer, as the slice is not "materialized" into the buffer
Which issue does this PR close?
Closes #1189
Rationale for this change
Arrow IPC is a good general purpose serde framework but we can get better performance by implementing specialized code optimized for Comet, which encodes single batches to shuffle blocks.
This PR implements a new BatchWriter and BatchReader and updates shuffle writer to use them when possible (when all data types are supported), falling back to Arrow IPC for other cases.
Specializations include:
TPC-H
Based on median of 5 runs I am seeing a 2-3% speedup.
Microbenchmarks
Without compression:
With LZ4 compression:
With ZSTD compression:
Benchmark Results
What changes are included in this PR?
How are these changes tested?
Existing tests + updated unit tests.