Refactor to use thrust::reduce on any. #685

ZelboK · 2024-07-27T20:34:22Z

Saw the issue open for this and did this on a whim. I'm not too comfortable deleting code from this repository so for brevity I'll leave things as is. All this does is just use thrust::reduce instead of the defined kernels

Some clarifying points:

This PR is likely not complete. I figured it would be better to get feedback earlier on incase I am doing something wrong.
Tests are seemingly passing on my 3080 WSL2 build but some of them segfault on main so I can't replicate a perfect test scenario.
I am leaving the dispatching to thrust. The Executors I didn't spend too much time studying.

ZelboK · 2024-07-27T20:35:49Z

include/matx/operators/any.h

+          inp_ptr + a_.TotalSize(), 
+          op.Init(), 
+          op);
+        *result_ptr = result;


So I'm not really that faimliar with MatX. on line 79 I am doing a device_pointer_cast - IIUC the Executor is what determines if it's on host or device. I changed Executor ex to Executor to get past the warning as error that it wasn't used.

Hi @ZelboK, the Exec functions take any generic operator and calls the transform function. In this case there's a separate transform overload for a CUDA executor and a host executor. Since the input can be any type of operator and not just a tensor, there may not be a Data() method since it doesn't have to have memory backing it. For example, a user could do:

(a = any(ones<int>({4,4}))).run();

ones has no Data function since it has no memory backing it. So for this to work properly it would have to use thrust's iterator interface and wrap the operator in that as we do in other transforms. This is not a trivial change and can be a bit difficult if you're not familiar with the library.

@cliffburdick I've contributed a bit to CCCL actually so I think I should be fine in that regard. That makes sense to me so far, I'll have to try and actually implement it to see how it goes. I presume you are referring to make_reduce_iterator for example. Curious to know why this isn't trivial though, is there something I'm missing?

Tangentially, is there a way to drastically reduce compile times? The feedback loop right now takes quite a long time.

I run with these options

cmake -DMATX_BUILD_TESTS=ON -DMATX_BUILD_BENCHMARKS=OFF -DMATX_BUILD_EXAMPLES=OFF -DMATX_BUILD_DOCS=OFF ..

and have tried commenting out tests but it still takes a long time before I actually get to see errors from the compiler. I have a pretty decent CPU (i9 12900k) too.

Hi @ZelboK, it might not be too hard then if you're familiar with CCCL and their iterators. We have our own iterator classes in iterator.h. Typically the way we use them is like this:

https://github.com/NVIDIA/MatX/blob/main/include/matx/transforms/cub.h#L720

We write a lambda to perform the function (CUB in this case), and ReduceInput wraps it in iterators and collapses it. The tricky part might be that we have not done this with thrust though, so I don't know if our iterators are missing something to get it to work there.

To reduce compile times you shouldn't build everything each time. What I do is I take an existing example like fft_conv.cu (or make a new one), put my code in there that I'm testing, and compile just that with something like make fft_conv. That should compile in just about 10 seconds on most machines, whereas compiling everything can approach an hour on weaker machines.

Thank you @cliffburdick , I test out my code in an example now and the feedback loop is much better. Much appreciated.

ZelboK · 2024-07-27T20:35:59Z

include/matx/transforms/reduce.h

-  __MATX_HOST__ __MATX_DEVICE__ __MATX_INLINE__ T Reduce(const T &v1, const T &v2)
-  {
+
+  __MATX_HOST__ __MATX_DEVICE__ __MATX_INLINE__ T operator()(const T &v1, const T &v2) const {


needs to be const for Thrust.

cliffburdick

Hi @ZelboK , thanks for the PR! I left a comment that outlines the basic summary. This change may require significant work since it has to use iterators and we currently have no existing operator that uses thrust as an example.

…od. Refactor so that the operator_utils is capable of dispatching to thrust

ZelboK · 2024-07-30T01:26:43Z

include/matx/operators/any.h

+        output_t out_base = output_;
+        auto op = detail::reduceOpAny<value_t>();
+
+        auto rv = ReduceInputThrust(std::forward<OpA>(a_), std::forward<out_t>(output_), std::forward<decltype(op)>(op));


We construct the op, and then send the inputs and outputs in their original form to the new function ReduceInputThrust and of which gets collapsed and dispatched to thrust accordingly.

include/matx/operators/any.h

ZelboK · 2024-07-30T01:31:05Z

include/matx/core/operator_utils.h

+      if (in_base.IsContiguous()) {
+        // the conversion is already handled for us by RandomOperatorIterator
+        thrust::reduce(
+            iter + *begin, iter + *end, op.Init(), op


@cliffburdick

So this function is responsible for dealing with the facct that In may not necessarily be a tensor_t but rather an operator of some sort. It'll get the respective offsets and construct an iterator for thrust to use, which, thankfully, does seem to be perfectly compatible.

Consequentially it would seem the code has also become potentially simpler than it's counterpart on main. Thrust is now responsible for deciding whom exactly to use in CUB rather than MatX. I might be missing some context though on whether or not MatX needing to be responsible for what function to use in CUB/Thrust.

In hindsight, I realize now that there might be issues with this approach. Does matXBinaryOp for example need to be utilized? I see that it has methods like PreRun PostRun etc that make use of the Executor(which I have used here)

Hi @ZelboK since a tensor is an operator, the iterator wrapper can turn any MatX operator type into an iterator. However, we do the contiguous check to allow CUB/thrust an optimization if it's a flat pointer with contiguous strides.

Regarding your second comment, the ReduceInput function shouldn't/doesn't need to know whether it's CUB or thrust. That's something you pass in your lambda that you give to the function. I'm saying this all without actually trying it, but if you assume the iterator type is compatible between both libraries then there may not be any changes to that code. This was how the example I pointed to previously worked (just search ReduceInput in cub.h).

I'm not sure what your question about matxBinaryOp is, but that's a wrapper class for any binary type and should be completely separate from this. matxBinaryOp, like most of our types is an operator, and you can pass it to thrust/cub and have the iterator pull from it.

@cliffburdick I see. I should have clarified - I actually did try to follow that example you've shown me like this(but I did it incorrectly). If you're curious, here's what I did.

auto output_ = cuda::std::get<0>(out); using out_t = decltype(output_); using value_t = typename out_t::value_type; using input_t = typename detail::base_type_t<OpA>; // incorrect, this is not any.h responsibility using output_t = typename detail::base_type_t<out_t>; // incorrect input_t in_base = a_; output_t out_base = output_; auto op = detail::reduceOpAny<value_t>(); auto fn = [&](input_t &&input, output_t && output, BeginOffset<input_t> &&begin, EndOffset<input_t> &&end) { return thrust::reduce( input + *begin, input + *end, op.Init(), op ); }; auto rv = ReduceInput(fn, out_base, in_base);

which brought me to my question of matXBinaryOp. This led to errors like

has no member "type" typedef typename thrust::iterator_system<InputIterator>::type System;

where InputIterator was of type

matx::detail::matxBinaryOp<matx::detail::ConstVal<int, cuda::std::__4::array<matx::index_t, 2UL>>, matx::index_t, matx::detail::BinOp<int, matx::index_t, matx::detail::AddF<int, matx::index_t>>>

I should not be passing the base types though. Let me push the fix. Thanks for the help.

ZelboK · 2024-07-30T14:59:14Z

include/matx/operators/any.h

+              input + *begin, input + *end, op.Init(), op
+            );
+        };
+        auto rv = ReduceInput(fn, output_, a_);


The iterators are thankfully compatible.

ZelboK · 2024-08-05T13:38:02Z

@cliffburdick Do you need anything done from my end? I imagine if so it would be relating to the executors?

cliffburdick · 2024-08-05T14:15:53Z

@cliffburdick Do you need anything done from my end? I imagine if so it would be relating to the executors?

Hi @ZelboK is this ready to test?

cliffburdick · 2024-08-05T14:15:59Z

/build

ZelboK · 2024-08-05T14:48:11Z

@cliffburdick Do you need anything done from my end? I imagine if so it would be relating to the executors?

Hi @ZelboK is this ready to test?

Yeah I recall running the tests which seemed to pass. Only thing is(even on main) some irrelevant tests segfault. Not sure if I am able to reproduce a good testing environment locally.

cliffburdick · 2024-08-05T14:52:39Z

Hi @ZelboK it's running, but we require all of our tests to pass. Unless it's an environment issue on your end we'll have to debug it.

cliffburdick · 2024-08-07T19:52:37Z

Hi @ZelboK , it looks like we're getting an exception in the tests:

[----------] 3 tests from ReductionTestsNumericNonComplexAllExecs/0, where TypeParam = cuda::std::__4::tuple<unsigned int, matx::cudaExecutor>
[ RUN      ] ReductionTestsNumericNonComplexAllExecs/0.Any
matxException (matxCudaError: ) - /home/jenkins/workspace/unit-tests/include/matx/operators/any.h:87

Stack Trace:
 test/matx_test : ()+0x1918f16
 test/matx_test : ()+0x9087ea6
 test/matx_test : ()+0x89fa02c
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x69
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x5e
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : testing::Test::Run()+0x104
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : testing::TestInfo::Run()+0x146
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : testing::TestSuite::Run()+0x140
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : testing::internal::UnitTestImpl::RunAllTests()+0x40a
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x69
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x5e
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : testing::UnitTest::Run()+0xcd
 test/matx_test : ()+0x18eda91
 test/matx_test : ()+0x18ed0b1
 /usr/lib/x86_64-linux-gnu/libc.so.6 : ()+0x29d90
 /usr/lib/x86_64-linux-gnu/libc.so.6 : __libc_start_main()+0x80
 test/matx_test : _start()+0x25
rv: Error in any(1 != 0)

Have you tried running them?

ZelboK · 2024-08-08T01:12:47Z

Hi @ZelboK , it looks like we're getting an exception in the tests:

[----------] 3 tests from ReductionTestsNumericNonComplexAllExecs/0, where TypeParam = cuda::std::__4::tuple<unsigned int, matx::cudaExecutor>
[ RUN      ] ReductionTestsNumericNonComplexAllExecs/0.Any
matxException (matxCudaError: ) - /home/jenkins/workspace/unit-tests/include/matx/operators/any.h:87

Stack Trace:
 test/matx_test : ()+0x1918f16
 test/matx_test : ()+0x9087ea6
 test/matx_test : ()+0x89fa02c
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x69
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x5e
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : testing::Test::Run()+0x104
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : testing::TestInfo::Run()+0x146
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : testing::TestSuite::Run()+0x140
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : testing::internal::UnitTestImpl::RunAllTests()+0x40a
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x69
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x5e
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : testing::UnitTest::Run()+0xcd
 test/matx_test : ()+0x18eda91
 test/matx_test : ()+0x18ed0b1
 /usr/lib/x86_64-linux-gnu/libc.so.6 : ()+0x29d90
 /usr/lib/x86_64-linux-gnu/libc.so.6 : __libc_start_main()+0x80
 test/matx_test : _start()+0x25
rv: Error in any(1 != 0)

Have you tried running them?

Hi,

Let me take a look when I have time later around Friday. Unfortunately I have less free time now than I'd like. I'll report back if I have troubles running the relevant tests. Curious to know if the test pipeline is only available through the NVIDIA VPN as an employee? I am not able to access the link.

cliffburdick · 2024-08-08T04:24:52Z

Hi @ZelboK , it looks like we're getting an exception in the tests:

[----------] 3 tests from ReductionTestsNumericNonComplexAllExecs/0, where TypeParam = cuda::std::__4::tuple<unsigned int, matx::cudaExecutor>
[ RUN      ] ReductionTestsNumericNonComplexAllExecs/0.Any
matxException (matxCudaError: ) - /home/jenkins/workspace/unit-tests/include/matx/operators/any.h:87

Stack Trace:
 test/matx_test : ()+0x1918f16
 test/matx_test : ()+0x9087ea6
 test/matx_test : ()+0x89fa02c
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : void testing::internal::HandleSehExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x69
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : void testing::internal::HandleExceptionsInMethodIfSupported<testing::Test, void>(testing::Test*, void (testing::Test::*)(), char const*)+0x5e
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : testing::Test::Run()+0x104
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : testing::TestInfo::Run()+0x146
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : testing::TestSuite::Run()+0x140
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : testing::internal::UnitTestImpl::RunAllTests()+0x40a
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : bool testing::internal::HandleSehExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x69
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : bool testing::internal::HandleExceptionsInMethodIfSupported<testing::internal::UnitTestImpl, bool>(testing::internal::UnitTestImpl*, bool (testing::internal::UnitTestImpl::*)(), char const*)+0x5e
 /home/jenkins/workspace/unit-tests/build/lib/libgtestd.so.1.11.0 : testing::UnitTest::Run()+0xcd
 test/matx_test : ()+0x18eda91
 test/matx_test : ()+0x18ed0b1
 /usr/lib/x86_64-linux-gnu/libc.so.6 : ()+0x29d90
 /usr/lib/x86_64-linux-gnu/libc.so.6 : __libc_start_main()+0x80
 test/matx_test : _start()+0x25
rv: Error in any(1 != 0)

Have you tried running them?

Hi,

Let me take a look when I have time later around Friday. Unfortunately I have less free time now than I'd like. I'll report back if I have troubles running the relevant tests. Curious to know if the test pipeline is only available through the NVIDIA VPN as an employee? I am not able to access the link.

Hi @ZelboK , unfortunately no, there's no way you can view it, but I can look into that. You should be able to reproduce the exact same error by compiling the unit tests, then from your build directory run:

test/matx_test --gtest_filter="*0.Any*"

ZelboK · 2024-08-11T15:18:51Z

Hi @ZelboK , unfortunately no, there's no way you can view it, but I can look into that. You should be able to reproduce the exact same error by compiling the unit tests, then from your build directory run:
test/matx_test --gtest_filter="*0.Any*"

@cliffburdick Let me take a look now. By any chance are you ever on the nvidia developer discord btw? I know a lot of other teams like CCCL and cutlass participate.

cliffburdick · 2024-08-11T16:23:31Z

Hi @ZelboK , unfortunately no, there's no way you can view it, but I can look into that. You should be able to reproduce the exact same error by compiling the unit tests, then from your build directory run:
test/matx_test --gtest_filter="*0.Any*"
@cliffburdick Let me take a look now. By any chance are you ever on the nvidia developer discord btw? I know a lot of other teams like CCCL and cutlass participate.

I didn't even know that existed, but I will try it out today or tomorrow.

Refactor to use thrust::reduce on any.

d0f3789

ZelboK commented Jul 27, 2024

View reviewed changes

cliffburdick requested changes Jul 28, 2024

View reviewed changes

ZelboK added 3 commits July 29, 2024 21:18

Account for the fact that some operators will not have a .Data() meth…

6dc560a

…od. Refactor so that the operator_utils is capable of dispatching to thrust

Clean up

15358ec

minor clean up

350b292

ZelboK commented Jul 30, 2024

View reviewed changes

include/matx/operators/any.h Show resolved Hide resolved

ZelboK commented Jul 30, 2024

View reviewed changes

revert changes to reduce.h and simplify design in any.h to use thrust

55d78c0

ZelboK commented Jul 30, 2024

View reviewed changes

remove changes to example

b8ea6c2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor to use thrust::reduce on any. #685

Refactor to use thrust::reduce on any. #685

ZelboK commented Jul 27, 2024 •

edited

Loading

ZelboK Jul 27, 2024

cliffburdick Jul 28, 2024

ZelboK Jul 28, 2024 •

edited

Loading

cliffburdick Jul 28, 2024

ZelboK Jul 30, 2024

ZelboK Jul 27, 2024

cliffburdick left a comment

ZelboK Jul 30, 2024

ZelboK Jul 30, 2024

ZelboK Jul 30, 2024

cliffburdick Jul 30, 2024

ZelboK Jul 30, 2024 •

edited

Loading

ZelboK Jul 30, 2024

ZelboK commented Aug 5, 2024

cliffburdick commented Aug 5, 2024

cliffburdick commented Aug 5, 2024

ZelboK commented Aug 5, 2024 •

edited

Loading

cliffburdick commented Aug 5, 2024

cliffburdick commented Aug 7, 2024 •

edited

Loading

ZelboK commented Aug 8, 2024

cliffburdick commented Aug 8, 2024

ZelboK commented Aug 11, 2024 •

edited

Loading

cliffburdick commented Aug 11, 2024

Refactor to use thrust::reduce on any. #685

Are you sure you want to change the base?

Refactor to use thrust::reduce on any. #685

Conversation

ZelboK commented Jul 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZelboK Jul 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cliffburdick left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZelboK Jul 30, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZelboK commented Aug 5, 2024

cliffburdick commented Aug 5, 2024

cliffburdick commented Aug 5, 2024

ZelboK commented Aug 5, 2024 • edited Loading

cliffburdick commented Aug 5, 2024

cliffburdick commented Aug 7, 2024 • edited Loading

ZelboK commented Aug 8, 2024

cliffburdick commented Aug 8, 2024

ZelboK commented Aug 11, 2024 • edited Loading

cliffburdick commented Aug 11, 2024

ZelboK commented Jul 27, 2024 •

edited

Loading

ZelboK Jul 28, 2024 •

edited

Loading

ZelboK Jul 30, 2024 •

edited

Loading

ZelboK commented Aug 5, 2024 •

edited

Loading

cliffburdick commented Aug 7, 2024 •

edited

Loading

ZelboK commented Aug 11, 2024 •

edited

Loading