This repository is mostly forked from ClusterManagers.jl
for using Julia interactively on a Slurm
cluster. I made some hacky changes.
-
The
addprocs_slurm
function now properly deals withcpus_per_task
argument, and set the environment variableJULIA_NUM_THREADS
accordingly. Further more, threaded BLAS can also be enabled! -
Add some default slurm arguments:
Slurm parameter => default value ntasks => 1 # how many process cpus_per_task => 1 # threads per process threads_per_core => 1 # disable hyper-threading topology => :master_worker # architecture hint job_file_loc => pwd() * "/output" # log output (relative path) t => "1000" # unit: min
- Now,
Base.Threads
andLinearAlgebra
are loaded the first whenaddprocs_slurm
successful has connected all the required nodes. They are using everywhere which is equivalent to a statement in global scope.@everywhere using LinearAlgebra, Base.Threads
- The
worker_info
can show the information of workers (with the packageHwloc
) as a dictionary.This can be done by a"worker_id" => myid(), # worker ID "cpu" => Sys.cpu_info()[1].model, # CPU model "hwinfo" => getinfo(), # architecture info "nthreads" => Threads.nthreads(), # Julia threads "blas_threads" => BLAS.get_num_threads(), # BLAS threads "blas_config" => BLAS.get_config(), # BLAS configuration "mem_free_GB" => Sys.free_memory() / (2^30) # free memory in GB
remotecall_fetch
:worker_info_i = remotecall_fetch(worker_info, i)
The functionalities are tests on a slurm cluster with Intel Xeon CPUs. We assume that MKL.jl
is being used and by default enable_MKL=true
. If you use open-blas, or you want to use it on a cluster with AMD CPUs, it is recommended to set enable_MKL=false
.
You can install the package by running
using Pkg
Pkg.add("https://github.com/PDE2718/SlurmManagers.jl")
First let's import some package. Please note that MKL/LinearAlgebra
are imported everywhere implicitly when you use SlurmManagers
. Other packages should be decorated by a @everywhere
macro.
using MKL, LinearAlgebra
using Distributed
using SlurmManagers
Now, add the processes and set up the worker pool wpool
by
addprocs_slurm("yourPartition"; ntasks=8, cpus_per_task=12)
wpool = WorkerPool(workers())
The test is done on a cluster where each node has two sockets Intel Xeon Gold 6240R × 2, 24 cores each, 48 cores in total. The slurm system automatically assigned 2 nodes, each running 4 tasks with 12 physical cores.
We can now interact with the master node and distribute our jobs dynamically! First let's check some information:
remotecall_fetch(myid, 2) # return 2
remotecall_fetch(worker_info, 3) # return a Dict
Let's define a job myfun
that requires some input arguments, let's say x
. We just need to include it on each worker.
@everywhere begin
function myfun(x::Real)
N = 1000
A = rand(N,N) + x*I |> Symmetric
t = @elapsed eigvals(A)
return t
end
end
Now we can get it down on any worker by remotecall
and fetch
. Or more simply by a remotecall_fetch
. For example, we pass x=5.
to worker 4 and wait it until it finishes its job and return the result:
remotecall_fetch(myfun, 4, 5.) # return t, on the remote worker
xs = rand(200)
# 20 core local machine, limited to memory bridge
@elapsed myfun.(xs) # 14.27 s
# 8 worker / 12 cores each. => 96 cores in total
@elapsed @sync pmap(myfun, wpool, xs; batch_size=2) # 1.18 s