-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try CuPy integration w/ Dask to see what, if any, operations benefit from GPU acceleration #6
Comments
I was checking in on Vaex recently and saw https://www.kaggle.com/jovanveljanoski/vaex-on-kaggle-gpu-performance-test, where they use We probably want to work with Dask arrays and CuPy directly rather than via Vaex, but just thought I'd point it out as an easy way to try CuPy. |
I tried swapping out numpy for CuPy arrays as Dask chunks in qc_call_rate_benchmarking_cuda.ipynb, but the results were not great. What takes about 30 seconds in the original notebook, as a parallel CPU implementation, takes more like a minute w/ CuPy-backed dask arrays. The time varies quite a bit based on chunk size, but about 100% slower was as fast as I could get it. On the other hand, using numba cuda.jit to do stuff not even possible w/ CuPy looks to be a win for LD prune (#26) so it's looking like for equal $ spent on GPUs and CPUs, GPUs will only make sense for pairwise algorithms (or worse). They're pretty rough benchmarks here though so it's definitely worth testing simpler things with CuPy more as the example workflows pile up. |
That's an interesting finding that some workloads are best on CPU and some are best on GPU. It makes transparent dispatch to different backends more valuable. |
No description provided.
The text was updated successfully, but these errors were encountered: