Some minimal examples on how to submit job in a SLURM-based or CONDOR-based computing clusters.
To find out which Machines have GPUs installed you can run:
condor_status -constraint 'PartitionableSlot && TotalGpus > 0' -af:h Machine TotalGPUs TotalCpus CUDADeviceName TotalMemory CUDAGlobalMemoryMb CUDACapability
condor_submit_bid 25 -i
condor_submit_bid 25 -i -append request_cpus=2 -append request_memory=4096
condor_submit_bid 25 -i -append request_cpus=4 -append request_gpus=8 -append request_memory=4096
condor_submit_bid 25 hello_condor.sub
condor_submit_bid 25 condor_sweep.sh
A good example Another example
Get infos about cluster nodes:
sinfo -o "%20N %10c %10m %20f %20G %10P"
sinfo -o "%20N %10c %10m %20f %20G %10P" | sort | uniq -c
srun --partition=gpu --gres=gpu:1 --time=00:15:00 --cpus-per-task=4 --pty bash
sbatch hello_slurm.sh
sbatch slurm_sweep.sh
- add a small training example
- add SLURM ✅
- lighter conda env
- conda sourcing
- better path def/expansion