Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to set the number of parallel threads? #66

Open
learning-chip opened this issue Apr 14, 2022 · 4 comments
Open

How to set the number of parallel threads? #66

learning-chip opened this issue Apr 14, 2022 · 4 comments

Comments

@learning-chip
Copy link

learning-chip commented Apr 14, 2022

The benchmark section shows faster execution with more threads. However, I cannot reproduce such parallel scaling.

The benchmark script:

using Random: seed!
using SparseArrays
using SuiteSparseGraphBLAS
using BenchmarkTools

@show Sys.CPU_THREADS
@show get(ENV, "OMP_NUM_THREADS", nothing)
@show get(ENV, "MKL_NUM_THREADS", nothing)
@show get(ENV, "OPENBLAS_NUM_THREADS", nothing)
@show get(ENV, "JULIA_NUM_THREADS", nothing)

seed!(0)
A = sprand(Float64, 10000, 10000, 0.05)
B = sprand(Float64, 10000, 1000, 0.1)

# @btime A * B

A_gb = GBMatrix(A)
B_gb = GBMatrix(B)

@btime A_gb * B_gb

Run with:

export OMP_NUM_THREADS=1
export MKL_NUM_THREADS=1
export OPENBLAS_NUM_THREADS=1
export JULIA_NUM_THREADS=1
julia ./graphblas_timing

Then, by changing any of the *_NUM_THREADS variables, the execution time always stays the same (~180ms on my 112-core machine). It is 7x faster than the built-in sparse matmul (~1.2s). However I don't really know how many threads it is using, and doesn't seem to be able to change it.

@rayegun
Copy link
Member

rayegun commented Apr 14, 2022

SuiteSparseGraphBLAS.gbset(:nthreads, <NUMTHREADS>). You can get the current number using SuiteSparseGraphBLAS.gbget(:nthreads).

This interface needs to be both improved and better documented, sorry about that. This is what it currently does: gbset(:nthreads, Sys.CPU_THREADS ÷ 2) on startup. I will likely change that at some point to use one of the environment variables above.

Note that it probably isn't going to use 56 threads on a problem of that size (or if it does it's not going to be scaling well). For most of the internal kernels you can observe what's happening with gbset(:burble, true). That will make SuiteSparse:GraphBLAS print out its internal diagnostic information, which includes the number of threads used.

@learning-chip
Copy link
Author

SuiteSparseGraphBLAS.gbset(:nthreads, )

This works well, thanks!

@rayegun
Copy link
Member

rayegun commented Apr 15, 2022

I'm going to leave this open until I find a better interface

@corbett5
Copy link

In a similar vein, do you know if multi-threading works on Apple ARM chips? Changing the number of threads with gbset has an impact on the number of threads reported by burble but it does not have an impact on the runtime or the CPU usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants