-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Hive hash
computation for nested types
#2720
base: branch-25.02
Are you sure you want to change the base?
Optimize Hive hash
computation for nested types
#2720
Conversation
Signed-off-by: Yan Feng <[email protected]>
Signed-off-by: Yan Feng <[email protected]>
Hive hash
computation for nested typesHive hash
computation for nested types
Signed-off-by: Yan Feng <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you post the benchmark results here?
Maybe also how it failed on the string type and how to reproduce it if you need help on that issue.
Signed-off-by: Yan Feng <[email protected]>
Signed-off-by: Yan Feng <[email protected]>
Signed-off-by: Yan Feng <[email protected]>
src/main/cpp/src/hive_hash.cu
Outdated
@@ -486,15 +526,77 @@ std::unique_ptr<cudf::column> hive_hash(cudf::table_view const& input, | |||
|
|||
check_nested_depth(input); | |||
|
|||
// `flattened_column_views` only contains nested columns and columns that result from flattening | |||
// nested columns | |||
std::vector<cudf::column_view> flattened_column_views; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The column view constructor will calculate null count which is time consuming.
The original approach does not need to calculate null count.
We may need to find a way to avoid this column view array.
Please help check if the contiguous_copy_column_device_views
is helpful?
Fixed the struct(string) type reports illegal memory access error: diff --git a/src/main/cpp/src/hive_hash.cu b/src/main/cpp/src/hive_hash.cu
index ca720b5..5e9bb35 100644
--- a/src/main/cpp/src/hive_hash.cu
+++ b/src/main/cpp/src/hive_hash.cu
@@ -22,6 +22,7 @@
#include <cudf/structs/structs_column_view.hpp>
#include <cudf/table/experimental/row_operators.cuh>
#include <cudf/table/table_device_view.cuh>
+#include <cudf/table/table_device_view.cuh>
#include <rmm/cuda_stream_view.hpp>
#include <rmm/exec_policy.hpp>
@@ -566,17 +567,9 @@ std::unique_ptr<cudf::column> hive_hash(cudf::table_view const& input,
}
}
- std::vector<cudf::column_device_view> device_flattened_column_views;
- device_flattened_column_views.reserve(flattened_column_views.size());
-
- std::transform(
- flattened_column_views.begin(),
- flattened_column_views.end(),
- std::back_inserter(device_flattened_column_views),
- [&stream](auto const& col) { return *cudf::column_device_view::create(col, stream); });
+ [[maybe_unused]] auto [device_view_owners, device_flattened_column_views] =
+ cudf::contiguous_copy_column_device_views<cudf::column_device_view>(flattened_column_views, stream);
- auto flattened_column_device_views =
- cudf::detail::make_device_uvector_async(device_flattened_column_views, stream, mr);
auto first_child_index_view =
cudf::detail::make_device_uvector_async(first_child_index, stream, mr);
auto nested_column_map_view =
@@ -594,7 +587,7 @@ std::unique_ptr<cudf::column> hive_hash(cudf::table_view const& input,
output_view.end<hive_hash_value_t>(),
hive_device_row_hasher<hive_hash_function, bool>(nullable,
*input_view,
- flattened_column_device_views.data(),
+ device_flattened_column_views,
first_child_index_view.data(),
nested_column_map_view.data()));
Please apply the above patch. |
Signed-off-by: Yan Feng <[email protected]>
- [&stream](auto const& col) { return *cudf::column_device_view::create(col, stream); }); Remember to never dereference the output from |
Co-authored-by: Nghia Truong <[email protected]>
Hive hash
computation for nested typesHive hash
computation for nested types
I just came up with some random thought: Please correct me if I am wrong. And even if this is doable, I am happy about doing with a follow-up PR since it is a NIT improvement. |
Good idea for struct/primitive only types. For the struct/primitive nested types(without list), we can use this idea. |
Yes, you are right! In terms of list type, it seems impossible to get rid of stack operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM in general, except the some NIT improvements
Co-authored-by: Nghia Truong <[email protected]>
Signed-off-by: Yan Feng <[email protected]>
Signed-off-by: Yan Feng <[email protected]>
Co-authored-by: Nghia Truong <[email protected]>
We do not need: private:
hash_functor_t hash_functor;
hive_device_row_hasher const& _parent; Move |
Signed-off-by: Yan Feng <[email protected]>
This is a good idea. It can save a register by eliminating reference variable |
Signed-off-by: Yan Feng <[email protected]>
Here is another optimization: diff --git a/src/main/cpp/src/hive_hash.cu b/src/main/cpp/src/hive_hash.cu
index 7ce8026f22..49b04de86c 100644
--- a/src/main/cpp/src/hive_hash.cu
+++ b/src/main/cpp/src/hive_hash.cu
@@ -39,13 +39,6 @@ using hive_hash_value_t = int32_t;
constexpr hive_hash_value_t HIVE_HASH_FACTOR = 31;
constexpr hive_hash_value_t HIVE_INIT_HASH = 0;
-struct col_info {
- cudf::type_id type_id;
- cudf::size_type
- nested_num_children_or_basic_col_idx; // Number of children for nested types, or column index
- // in `basic_cdvs` for basic types
-};
-
hive_hash_value_t __device__ inline compute_int(int32_t key) { return key; }
hive_hash_value_t __device__ inline compute_long(int64_t key)
@@ -158,6 +151,18 @@ hive_hash_value_t __device__ inline hive_hash_function<cudf::timestamp_us>::oper
return static_cast<hive_hash_value_t>(result);
}
+/**
+ * @brief The struct storing column's auxiliary information.
+ */
+struct col_info {
+ // Column type id.
+ cudf::type_id type_id;
+
+ // Store the the upper bound of number of elements for lists column, or the upper bound of
+ // number of children for structs column, or column index in `basic_cdvs` for basic types.
+ cudf::size_type upper_bound_idx_or_basic_col_idx;
+};
+
/**
* @brief Computes the hash value of a row in the given table.
*
@@ -205,11 +210,11 @@ class hive_device_row_hasher {
auto const col_info = _col_infos[flattened_index];
auto const col_hash =
(col_info.type_id == cudf::type_id::LIST || col_info.type_id == cudf::type_id::STRUCT)
- ? hash_nested(flattened_index, row_index)
+ ? hash_nested(flattened_index, row_index, col_info)
: cudf::type_dispatcher<cudf::experimental::dispatch_void_if_nested>(
cudf::data_type{col_info.type_id},
_hash_functor,
- _basic_cdvs[col_info.nested_num_children_or_basic_col_idx],
+ _basic_cdvs[col_info.upper_bound_idx_or_basic_col_idx],
row_index);
return HIVE_HASH_FACTOR * hash + col_hash;
});
@@ -221,19 +226,24 @@ class hive_device_row_hasher {
*/
struct col_stack_frame {
private:
- cudf::size_type _col_idx; // the column index in the flattened array
- cudf::size_type _row_idx; // the index of the row in the column
- int _idx_to_process; // the index of child or element to process next
- hive_hash_value_t _cur_hash; // current hash value of the column
+ col_info _col_info; // the column info
+ cudf::size_type _col_idx; // the column index in the flattened array
+ cudf::size_type _row_idx; // the index of the row in the column
+ cudf::size_type _idx_to_process; // the index of child or element to process next
+ hive_hash_value_t _cur_hash; // current hash value of the column
public:
__device__ col_stack_frame() = default;
- __device__ void init(cudf::size_type col_index, cudf::size_type row_idx)
+ __device__ void init(cudf::size_type col_index,
+ cudf::size_type row_idx,
+ cudf::size_type idx_begin,
+ col_info info)
{
_col_idx = col_index;
_row_idx = row_idx;
- _idx_to_process = 0;
+ _idx_to_process = idx_begin;
+ _col_info = info;
_cur_hash = HIVE_INIT_HASH;
}
@@ -248,6 +258,8 @@ class hive_device_row_hasher {
__device__ int get_idx_to_process() const { return _idx_to_process; }
+ __device__ col_info get_col_info() const { return _col_info; }
+
__device__ cudf::size_type get_col_idx() const { return _col_idx; }
__device__ cudf::size_type get_row_idx() const { return _row_idx; }
@@ -366,74 +378,96 @@ class hive_device_row_hasher {
*
* @param flattened_index The index of the column in the flattened array
* @param row_index The index of the row to compute the hash for
+ * @param curr_col_info the column's information
* @return The computed hive hash value
*/
__device__ hive_hash_value_t hash_nested(cudf::size_type flattened_index,
- cudf::size_type row_index) const noexcept
+ cudf::size_type row_index,
+ col_info curr_col_info) const noexcept
{
auto next_col_idx = flattened_index + 1;
col_stack_frame col_stack[MAX_STACK_DEPTH];
int stack_size = 0;
- col_stack[stack_size++].init(flattened_index, row_index);
+
+ // If the current column is a lists column, we need to store the upper bound row offset.
+ // Otherwise, it is a structs column and already stores the number of children.
+ cudf::size_type curr_idx_begin = 0;
+ if (curr_col_info.type_id == cudf::type_id::LIST) {
+ auto const offsets = _basic_cdvs[_col_infos[next_col_idx].upper_bound_idx_or_basic_col_idx];
+ curr_idx_begin = offsets.template element<cudf::size_type>(row_index);
+ curr_col_info.upper_bound_idx_or_basic_col_idx =
+ offsets.template element<cudf::size_type>(row_index + 1);
+ }
+ if (curr_col_info.upper_bound_idx_or_basic_col_idx == curr_idx_begin) { return HIVE_INIT_HASH; }
+
+ col_stack[stack_size++].init(flattened_index, row_index, curr_idx_begin, curr_col_info);
while (stack_size > 0) {
col_stack_frame& top = col_stack[stack_size - 1];
auto const curr_col_idx = top.get_col_idx();
auto const curr_row_idx = top.get_row_idx();
- auto const curr_col_info = _col_infos[curr_col_idx];
+ auto const curr_col_info = top.get_col_info();
// Do not pop it until it is processed. The definition of `processed` is:
// - For structs, it is when all child columns are processed.
// - For lists, it is when all elements in the list are processed.
if (curr_col_info.type_id == cudf::type_id::STRUCT) {
- if (top.get_idx_to_process() == curr_col_info.nested_num_children_or_basic_col_idx) {
+ if (top.get_idx_to_process() == curr_col_info.upper_bound_idx_or_basic_col_idx) {
if (--stack_size > 0) { col_stack[stack_size - 1].update_cur_hash(top.get_hash()); }
} else {
// Reset `next_col_idx` to keep track of the struct's children index.
if (top.get_idx_to_process() == 0) { next_col_idx = curr_col_idx + 1; }
- while (top.get_idx_to_process() < curr_col_info.nested_num_children_or_basic_col_idx) {
+
+ while (top.get_idx_to_process() < curr_col_info.upper_bound_idx_or_basic_col_idx) {
top.get_and_inc_idx_to_process();
auto const child_col_idx = next_col_idx++;
- auto const child_info = _col_infos[child_col_idx];
+ auto child_info = _col_infos[child_col_idx];
// If the child is of primitive type, accumulate child hash into struct hash
if (child_info.type_id != cudf::type_id::LIST &&
child_info.type_id != cudf::type_id::STRUCT) {
- auto const child_col = _basic_cdvs[child_info.nested_num_children_or_basic_col_idx];
+ auto const child_col = _basic_cdvs[child_info.upper_bound_idx_or_basic_col_idx];
auto const child_hash =
cudf::type_dispatcher<cudf::experimental::dispatch_void_if_nested>(
child_col.type(), _hash_functor, child_col, curr_row_idx);
top.update_cur_hash(child_hash);
} else {
- col_stack[stack_size++].init(child_col_idx, curr_row_idx);
+ cudf::size_type child_idx_begin = 0;
+ if (child_info.type_id == cudf::type_id::LIST) {
+ auto const child_offsets_col_idx = child_col_idx + 1;
+ auto const child_offsets =
+ _basic_cdvs[_col_infos[child_offsets_col_idx].upper_bound_idx_or_basic_col_idx];
+ child_idx_begin = child_offsets.template element<cudf::size_type>(curr_row_idx);
+ child_info.upper_bound_idx_or_basic_col_idx =
+ child_offsets.template element<cudf::size_type>(curr_row_idx + 1);
+
+ // Ignore this child if it does not have any element.
+ if (child_info.upper_bound_idx_or_basic_col_idx == child_idx_begin) {
+ next_col_idx += 2;
+ }
+ }
+ if (child_info.upper_bound_idx_or_basic_col_idx > child_idx_begin) {
+ col_stack[stack_size++].init(
+ child_col_idx, curr_row_idx, child_idx_begin, child_info);
+ }
break;
}
}
}
} else if (curr_col_info.type_id == cudf::type_id::LIST) {
- // Get the child column of the list column
- auto const offsets_col_idx = curr_col_idx + 1;
- auto const child_col_idx = curr_col_idx + 2;
+ auto const child_col_idx = curr_col_idx + 2;
+ auto child_info = _col_infos[child_col_idx];
// Move `next_col_idx` forward pass the current lists column.
// Children of a lists column always stay next to it and are not tracked by this.
if (next_col_idx <= child_col_idx) { next_col_idx = child_col_idx + 1; }
- auto const offsets_col =
- _basic_cdvs[_col_infos[offsets_col_idx].nested_num_children_or_basic_col_idx];
-
- auto const child_col_info = _col_infos[child_col_idx];
- auto const child_row_idx_begin =
- offsets_col.template element<cudf::size_type>(curr_row_idx);
- auto const child_row_idx_end =
- offsets_col.template element<cudf::size_type>(curr_row_idx + 1);
-
// If the child column is of primitive type, directly compute the hash value of the list
- if (child_col_info.type_id != cudf::type_id::LIST &&
- child_col_info.type_id != cudf::type_id::STRUCT) {
- auto const child_col = _basic_cdvs[child_col_info.nested_num_children_or_basic_col_idx];
+ if (child_info.type_id != cudf::type_id::LIST &&
+ child_info.type_id != cudf::type_id::STRUCT) {
+ auto const child_col = _basic_cdvs[child_info.upper_bound_idx_or_basic_col_idx];
auto const single_level_list_hash = cudf::detail::accumulate(
- thrust::counting_iterator(child_row_idx_begin),
- thrust::counting_iterator(child_row_idx_end),
+ thrust::make_counting_iterator(top.get_idx_to_process()),
+ thrust::make_counting_iterator(curr_col_info.upper_bound_idx_or_basic_col_idx),
HIVE_INIT_HASH,
[child_col, hasher = _hash_functor] __device__(auto hash, auto element_index) {
auto cur_hash = cudf::type_dispatcher<cudf::experimental::dispatch_void_if_nested>(
@@ -443,16 +477,35 @@ class hive_device_row_hasher {
top.update_cur_hash(single_level_list_hash);
if (--stack_size > 0) { col_stack[stack_size - 1].update_cur_hash(top.get_hash()); }
} else {
- if (top.get_idx_to_process() == child_row_idx_end - child_row_idx_begin) {
+ if (top.get_idx_to_process() == curr_col_info.upper_bound_idx_or_basic_col_idx) {
if (--stack_size > 0) { col_stack[stack_size - 1].update_cur_hash(top.get_hash()); }
} else {
// Push the next element into the stack
- col_stack[stack_size++].init(child_col_idx,
- child_row_idx_begin + top.get_and_inc_idx_to_process());
+ cudf::size_type child_idx_begin = 0;
+ if (child_info.type_id == cudf::type_id::LIST) {
+ auto const child_offsets_col_idx = child_col_idx + 1;
+ auto const child_offsets =
+ _basic_cdvs[_col_infos[child_offsets_col_idx].upper_bound_idx_or_basic_col_idx];
+ child_idx_begin =
+ child_offsets.template element<cudf::size_type>(top.get_idx_to_process());
+ child_info.upper_bound_idx_or_basic_col_idx =
+ child_offsets.template element<cudf::size_type>(top.get_idx_to_process() + 1);
+
+ // Ignore this child if it does not have any element.
+ if (child_info.upper_bound_idx_or_basic_col_idx == child_idx_begin) {
+ next_col_idx += 2;
+ }
+ }
+ if (child_info.upper_bound_idx_or_basic_col_idx > child_idx_begin) {
+ col_stack[stack_size++].init(
+ child_col_idx, top.get_idx_to_process(), child_idx_begin, child_info);
+ }
+ top.get_and_inc_idx_to_process();
}
}
}
}
+
return col_stack[0].get_hash();
}
@@ -506,11 +559,13 @@ void flatten_table(std::vector<col_info>& col_infos,
column_processer_fn_t flatten_column = [&](cudf::column_view const& col) {
auto const type_id = col.type().id();
if (type_id == cudf::type_id::LIST) {
- col_infos.emplace_back(col_info{type_id, col.num_children()});
+ // Nested size will be updated separately for each row.
+ col_infos.emplace_back(col_info{type_id, -1});
auto const list_col = cudf::lists_column_view(col);
flatten_column(list_col.offsets());
flatten_column(list_col.get_sliced_child(stream));
} else if (type_id == cudf::type_id::STRUCT) {
+ // Nested size for struct columns is number of children.
col_infos.emplace_back(col_info{type_id, col.num_children()});
auto const struct_col = cudf::structs_column_view(col);
for (auto child_idx = 0; child_idx < col.num_children(); child_idx++) { This patch reduces global memory access as much as possible. Let's see how the benchmark looks like. |
Co-authored-by: Nghia Truong <[email protected]>
src/main/cpp/src/hive_hash.cu
Outdated
child_info.upper_bound_idx_or_basic_col_idx = | ||
child_offsets.template element<cudf::size_type>(top.get_idx_to_process() + 1); | ||
|
||
// Ignore this child if it does not have any element. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An empty list
will also affect the hash value of its parent.
I added some tests for corner cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code has become increasingly difficult to understand and maintain. 🤕
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An empty
list
will also affect the hash value of its parent.
From the docs, I see that:
* hive_hash_value_t hive_hash(NestedType element) {
* hive_hash_value_t hash = HIVE_INIT_HASH;
* for (int i = 0; i < element.num_child(); i++) {
* hash = hash * HIVE_HASH_FACTOR + hive_hash(element.get_child(i));
* }
* return hash;
* }
So when num_child
(or num list elements) is 0
, the for
loop does not execute thus the returned value is HIVE_INIT_HASH
. In my patch, this for
loop is ignored completely if there is no children/list elements, and that yields basically the same output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HIVE_INIT_HASH
is 0. The hash value of structContainsNoChild
itself is 0, but the hash values of struct(int1, structContainsNoChild, int2)
and struct(int1, int2)
are different. The current code is missing a call to top.update_cur_hash(structContainsNoChild_hash)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I am not satisfied with storing cur_hash
in the stack_frame
because this requires modifying the new stack top when popping.
I think that cur_hash
can be replaced with the exponent of 31 (or the result of Hive hash
only involves addition and multiplication of INT32
, so changing the order of operations should not affect the result.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current code is missing a call to
top.update_cur_hash(structContainsNoChild_hash)
.
This should fix that corner case:
diff --git a/src/main/cpp/src/hive_hash.cu b/src/main/cpp/src/hive_hash.cu
index e152bea33c..d8a33b3d35 100644
--- a/src/main/cpp/src/hive_hash.cu
+++ b/src/main/cpp/src/hive_hash.cu
@@ -448,6 +448,8 @@ class hive_device_row_hasher {
if (child_info.upper_bound_idx_or_basic_col_idx > child_idx_begin) {
col_stack[stack_size++].init(
child_col_idx, curr_row_idx, child_idx_begin, child_info);
+ } else {
+ top.update_cur_hash(HIVE_INIT_HASH);
}
break;
}
@@ -499,6 +501,8 @@ class hive_device_row_hasher {
if (child_info.upper_bound_idx_or_basic_col_idx > child_idx_begin) {
col_stack[stack_size++].init(
child_col_idx, top.get_idx_to_process(), child_idx_begin, child_info);
+ } else {
+ top.update_cur_hash(HIVE_INIT_HASH);
}
top.get_and_inc_idx_to_process();
}
Signed-off-by: Yan Feng <[email protected]>
Signed-off-by: Yan Feng <[email protected]>
Signed-off-by: Yan Feng <[email protected]>
Co-authored-by: Nghia Truong <[email protected]>
Co-authored-by: Nghia Truong <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new code reduces stack memory usage by 3X. This can reduce register pressure and lead to potentially higher occupancy as well as performance.
Signed-off-by: Yan Feng <[email protected]>
Signed-off-by: Yan Feng <[email protected]>
Main Optimization:
Flatten nested columns in advance, reducing the size of the
stack_frame
.Possible Further Optimization:
xxhash64
does not require_cur_hash
, which presents a challenge for unification.Benchmark:
struct
with a depth ofmax_depth
, with the basic type beingINT32
,FLOAT32
andSTRING
list
with a depth ofmax_depth
, with the basic type beingINT32
list
with a depth ofmax_depth
, with the basic type beingSTRING