Introductory-CSRC-P

An introductory supercomputing course developed by the SHAO SKA team.

China SKA Regional Centre Prototype (CSRC-P) User Manual

China SKA Regional Centre Prototype (CSRC-P) systems can use Slurm for resource and job management to avoid mutual interference and improve operational efficiency. All jobs that need to be run, whether for program debugging or business calculations, must be submitted through interactive parallel srun, batch submittion sbatch, or distributed salloc commands, and related commands can be used to query the job status after submission. Please do not directly run jobs (except compiling) on the login node, so as not to affect the normal use of other users.

Querying SLURM Partitions
Querying the Queue
Job Request
Jobscripts
Serial python example
Lauching Paralle Programs
GPU Example
Interactive Jobs
Job cancel

A SLURM partition is a queue, which is all the partition that are managed by a single SLURM daemon.

Querying SLURM Partitions

To list the partition when logged into a machine:
sinfo
To get all partitions in all local clusters: sinfo –M all
For example:

[blao@x86-logon01 ~]$ sinfo  
PARTITION      AVAIL  TIMELIMIT  NODES  STATE NODELIST  
arm               up   infinite      1  down* taishan-arm-cpu10  
arm               up   infinite      1  drain taishan-arm-cpu01  
arm               up   infinite      8   idle taishan-arm-cpu[02-09]  
purley-cpu*       up   infinite      1  alloc purley-x86-cpu01  
purley-cpu*       up   infinite      7   idle purley-x86-cpu[02-08]  
sugon-gpu         up   infinite      1   idle sugon-gpu01  
inspur-gpu-opa    up   infinite      1    mix inspur-gpu02  
inspur-gpu-ib     up   infinite      1  down* inspur-gpu01  
knm               up   infinite      4  down* knm-x86-cpu[01-04]  
knl               up   infinite      1  down* knl-x86-cpu02  
all-gpu           up   infinite      1  down* inspur-gpu01  
all-gpu           up   infinite      1    mix inspur-gpu02  
all-gpu           up   infinite      1   idle sugon-gpu01

Where PARTITION represents the partition name. AVAIL represents the status of the partition, the up is available, and the down is not available. TIMELIMIT represents the maximum time the program runs, and infinite represents unlimited, the limit format is days-houres:minutes:seconds. NODES represents the number of nodes. NODELIST is Node list. STATE represents the running state of the node.

STATE: node state, possible states include:    
          -allocated, alloc: allocated  
          -completing, comp: completing  
          -down: Down  
          -drained, drain: has lost vitality  
          -fail: failure  
          -idle: idle  
          -mix: mix, the node is running jobs, but some idle CPU cores can accept new jobs  
          -reserved, resv: resource reservation  
          -unknown, unk: unknown reason

It is important to use the correct system and partition for each part of a workflow:

Partition	Purpose
`arm`	Many core workflows, many serial of shared memory (OpenMP) jobs, large distributed memory (MPI) jobs
`sugon-gpu`, `inspur-gpu-opa`, `inspur-gpu-ib`, `all-gpu`	GPU-accelerated jobs, artificial Intelligence jobs
`purley-cpu`,`hw`,`all-x86-cpu`	Many serial of shared memory (OpenMP) jobs, large distributed memory (MPI) jobs

Querying the Queue

squeue is used to view job and job step information for jobs managed by Slurm.
squeue <options>
squeue -u username
squeue -j jobid
squeue -p purley-cpu
Exmaple

[blao@x86-logon01 ~]$ squeue 
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
17810 inspur-gp step.bat      xzj  R 3-03:15:24      1 inspur-gpu02
17921 inspur-gp test.bat      xzj  R 1-19:45:57      1 inspur-gpu02
17931 purley-cp  test.sh feirui20  R    1:04:06      1 purley-x86-cpu01

Information interprets
JOBID: ID of job.
PARTITION: partition name used of the job.
NAME: job name.
USER: user name.
ST: job state. R=running. PD=pending. CA=cancelled. CG=completing. CD=completed.
TIME: time used by the job
NODES: the actual number of nodes allocated to the job.
NODESLIST: list of nodes allocated to the job or job step.

To view more options of squeue:
squeue --help

Individual Job Information
scontrol show job jobid

[blao@x86-logon01 ~]$ scontrol show job 17810  
JobId=17810 JobName=step.batch  
   UserId=xzj(10015) GroupId=xzj(10019) MCS_label=N/A  
   Priority=4294901731 Nice=0 Account=xzj QOS=normal  
   JobState=RUNNING Reason=None Dependency=(null)  
   Requeue=1 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0  
   RunTime=3-03:41:48 TimeLimit=UNLIMITED TimeMin=N/A  
   SubmitTime=2020-09-23T11:16:25 EligibleTime=2020-09-23T11:16:25  
   AccrueTime=2020-09-23T11:16:25  
   StartTime=2020-09-23T11:16:26 EndTime=Unknown Deadline=N/A  
   PreemptTime=None SuspendTime=None SecsPreSuspend=0  
   LastSchedEval=2020-09-23T11:16:26  
   Partition=inspur-gpu-opa AllocNode:Sid=x86-logon01:339306  
   ReqNodeList=(null) ExcNodeList=(null)  
   NodeList=inspur-gpu02  
   BatchHost=inspur-gpu02  
   NumNodes=1 NumCPUs=1 NumTasks=1 CPUs/Task=1 ReqB:S:C:T=0:0:*:*  
   TRES=cpu=1,mem=3001M,node=1,billing=1,gres/gpu=1  
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*  
   MinCPUsNode=1 MinMemoryNode=3001M MinTmpDiskNode=0  
   Features=(null) DelayBoot=00:00:00  
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)  
   Command=/home/xzj/STEP/AI/step.batch  
   WorkDir=/home/xzj/STEP/AI  
   StdErr=/home/xzj/STEP/AI/step.out  
   StdIn=/dev/null  
   StdOut=/home/xzj/STEP/AI/step.out  
   Power=  
   TresPerNode=gpu:1

Job Request

SLURM needs to know two things from you:

Resource requirement. How many nodes and how long you need them for.

What to run.

You cannot submit an application directly to SLURM. Instead, SLURM executes on your behalf a list of shell commands.
In batch mode. SLURM executes a jobscript which contains the commands.
In interactive mode, type in commands just like when you log in.
These commands can include launching programs onto compute nodes assigned for the job.

Jobscripts

A `jobscript` is a bash or csh script.

sbatch interprets directives in the script, which are written as comments and not executed.

Directive lines start with #SBATCH
These are quivalent to sabtch command-line arguments.
Directives are usually more convenient and reproducible than command-line arguments. Put your resource request into the jobscript.
The jobscript will execute on one of the allocated compute node
#SBATCH directives are comments, so only subsequent commands are executed.

Common sbatch directives

Example jobscript (hostname.sh)

#!/bin/bash  
#SBATCH --job-name=myjob  #makes it easier to find in squeue  
#SBATCH --partition=purley-cpu # partition name  
#SBATCH --nodes=2 # number of nodes  
#SBATCH --tasks-per-node=1 #processes per node  
#SBATCH --cpus-per-task=1 #cores per process  
#SBATCH --time=00:05:00 # walltime requested  
#SBATCH --export=NONE # start with a clean environment.  This improves reproducibility and avoids contamination of the environment.  
#The next line is executed on the compute node  
srun /usr/bin/hostname

Launch the job with sbatch:

sbatch hostname.sh

To view more options of sbatch:

sbatch --help

SLURM Output

Standard output and standard error from your jobscript are collected by SLURM, and written to a file in the directory you submitted the job when the job finishes/dies.
slurm-jobid.out
cat slurm-jobid.out

Serial python example (hello-serial.sh)

This job script will run on a single core of SHAO’s cluster for up to 5 minutes:

#!/bin/bash 
#SBATCH --job-name=hello-serial  
#SBATCH –partition=purley-cpu  
#SBATCH --nodes=1  
#SBATCH --tasks-per-node=1  
#SBATCH --cpus-per-task=1  
#SBATCH --time=00:05:00  
#SBATCH --export=NONE  
# load modules  
module use /home/app/modulefiles  
module load python/cpu-3.7.4  
# launch serial python script  
python3 hello-serial.py

The script can be submitted to the scheduler with:
sbatch hello-serial.sh

Lauching Paralle Programs

Parallel applications are launched using srun. The arguments determine the parallelism:
-N number of nodes
-n number of tasks (for process parallelism e.g. MPI)
-c cores per task (for thread parallelism e.g. OpenMP)
While these are already provided in the SBATCH directives, they should be provided again in the srun arguments.

OpenMP example (hello-openmp.sh) This will run 1 process with 24 threads on 1 purley-cpu compute node, using 24 cores for up to 5 minutes:

#!/bin/bash   
#SBATCH --job-name=hello-openmp  
#SBATCH --partition=purley-cpu  
#SBATCH --nodes=1  
#SBATCH --tasks-per-node=1  
#SBATCH --cpus-per-task=24  
#SBATCH --time=00:05:00  
#SBATCH --export=NONE  

# set OpenMP environment variables  
export OMP_NUM_THREADS=24  
export OMP_PLACES=cores  
export OMP_PROC_BIND=close  

# launch OpenMP program  
srun --export=all -n 1 -c ${OMP_NUM_THREADS} ./hello-openmp-gcc

The program can be compiled and the script can be submitted to the scheduler with:
cd hello-openmp
make
sbatch hello-openmp.sh

MPI example
This will run 24 MPI processes on 1 node on purley-cpu:

#!/bin/bash   
#SBATCH --job-name=hello-mpi 
#SBATCH --partition=purley-cpu  
#SBATCH --nodes=1  
#SBATCH --tasks-per-node=24  
#SBATCH --cpus-per-task=1  
#SBATCH --mem-per-cpu=4GB  
#SBATCH --time=00:05:00  
#SBATCH --export=NONE  

# prepare MPI environment  
module use /home/app/modulefiles  
module load mpich/cpu-3.2.1-gcc-7.3.0  

# launch MPI program
srun --export=all --mpi=pmi2 -N 1 -n 24 ./hello-mpi

The script can be submitted to the scheduler with:
cd hello-mpi
module load mpich/cpu-3.2.1-gcc-7.3.0
make
sbatch hello-mpi.sh

GPU Example (hello-gpu.sh)

This will run one gpu device on a GPU node:

#!/bin/bash

#SBATCH --nodes=1
#SBATCH --time=00:05:00
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:1 #number of gpu devices
#SBATCH --mem=8g
#SBATCH --partition=inspur-gpu-opa
#SBATCH --export=NONE

# prepare GPU environment
module use /home/app/modulefiles
module load cuda/9.0

# launch GPU program
srun --export=all -N 1 -n 1 ./hello-gpu

The script can be submitted to the scheduler with:
cd hello-gpu
module load cuda/9.0
nvcc -o hello-gpu hello-gpu.cu
sbatch hello-gpu.sh

All scripts can be downloaded from this git repository:
https://github.com/lao19881213/Introductory-CSRC-P

Interactive Jobs

Sometimes you need them:

Debugging
Compiling
Pre/post-processing
Use salloc instead of sbatch.
You still need srun to place jobs onto compute nodes.
#SBATCH directives must be included as command line arguments.

For example (compiling):
salloc --tasks=16 --partition purley-cpu --time=00:10:00 srun make -j 16

Run hello-serial.py interactively on a purley-cpu compute node.

Start an interactive session (you may need to wait while it is in the queue):

salloc --partition=purley-cpu --tasks=1 --time=00:10:00

Prepare the environment:
module use /home/app/modulefiles
module load python/cpu-3.7.4

Launch the program:
srun --export=all -n 1 python3 hello-serial.py
Exit the interactive session:
exit

You can also run a hello-serial interactively only using srun:
Prepare the environment:
module use /home/app/modulefiles
module load python/cpu-3.7.4

Launch the program:
srun --export=all -N 1 -n 1 -p purley-cpu python3 hello-serial.py

Job cancel

scancel JOBID

Environment modules

To prevent conflicts between software names and versions, applications and libraries are not installed in the standard directory locations. Modules modify the environment to easily locate software, libraries, documentation, or particular versions of the software.

Module system to manage these variables for each application

Command	Description
module avail	Show available modules
module list	List loaded modules
module load `modulename`	Load a module into the current environment
module unload `modulename`	Unload a module from the environment
module swap `module1` `module2`	Swap a loaded module with another
module show `modulename`	Give help for a particular module
module help	Show module specific help

The modulefiles for users are in /home/app/modulefiles directory. Example of load gcc compiler version of 9.30.

module use /home/app/modulefiles   
module load gcc/cpu-9.3.0

The module name of software are known as software/cpu-version, software/arm-version or software/gpu-version, which are represent the software will run on 3 corresponding machine: x86 machine, ARM machine and gpu machine.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introductory-CSRC-P

China SKA Regional Centre Prototype (CSRC-P) User Manual

Querying SLURM Partitions

Querying the Queue

Job Request

Resource requirement. How many nodes and how long you need them for.

What to run.

Jobscripts

A `jobscript` is a bash or csh script.

sbatch interprets directives in the script, which are written as comments and not executed.

Common sbatch directives

Launch the job with sbatch:

To view more options of sbatch:

SLURM Output

Serial python example (hello-serial.sh)

Lauching Paralle Programs

GPU Example (hello-gpu.sh)

Interactive Jobs

Job cancel

Environment modules

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
arm_mpi4py_test		arm_mpi4py_test
hello-gpu		hello-gpu
hello-mpi		hello-mpi
hello-openmp		hello-openmp
hello-serial		hello-serial
hostname		hostname
x86_mpi4py_test		x86_mpi4py_test
.gitignore		.gitignore
README.md		README.md

SHAO-SKA/Introductory-CSRC-P

Folders and files

Latest commit

History

Repository files navigation

Introductory-CSRC-P

China SKA Regional Centre Prototype (CSRC-P) User Manual

Querying SLURM Partitions

Querying the Queue

Job Request

Resource requirement. How many nodes and how long you need them for.

What to run.

Jobscripts

A jobscript is a bash or csh script.

sbatch interprets directives in the script, which are written as comments and not executed.

Common sbatch directives

Launch the job with sbatch:

To view more options of sbatch:

SLURM Output

Serial python example (hello-serial.sh)

Lauching Paralle Programs

GPU Example (hello-gpu.sh)

Interactive Jobs

Job cancel

Environment modules

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

A `jobscript` is a bash or csh script.

Packages