Skip to content

Running HPX on SuperMIC

Parsa Amini edited this page Feb 20, 2019 · 1 revision

About

SuperMIC has a total of 382 nodes, each with two 10-core 2.8GHz Intel Ivy Bridge-EP processors. 380 compute nodes each have 64 GB of memory and 500 GB of local HDD storage. 360 of the compute nodes have 2 Intel Xeon Phi 7120P [Knights Corner also known as KnC] coprocessors. 20 of the compute nodes have 1 Intel Xeon Phi 7120P coprocessor and 1 NVIDIA Tesla K20X. HPC@LSU

SuperMIC uses Torque 3.0.6 as its batch scheduler. The official documentation for SuperMIC can be viewed at http://www.hpc.lsu.edu/resources/hpc/system.php?system=SuperMIC.

Accounts and Allocations

You can create an account, request to join allocations, request new allocations, and see remaining balances on LSU HPC's account management.

Software

SuperMIC provides its available software with modules .

The list of available modules can be viewed on the machine itself by doing a module avail or on the cluster's software documentation.

Using modules

  • module load <module_1>[ <module_2> [<module_3> ...]]: Load module(s) <module_1>, etc to the current session
    • Equivalent to module add <module_1>[ <module_2> [<module_3> ...]]
    • e.g. module load GCC/4.9.0 python/2.7.7/GCC-4.9.0
  • module unload <module_1>[ <module_2> [<module_3> ...]]: Unloading module(s) <module_1>, etc from the current session
    • Equivalent to module rm <module_1>[ <module_2> [<module_3> ...]]
    • e.g. module unload intel
  • module swap <module_1> <module_2>: Unload module <module_1> and load <module_2>. Typically used when two modules conflict with each other
    • Equivalent to module switch <module_1> <module_2>
  • module list: List all modules loaded on the current session
  • module purge: Unload all modules loaded from the current session

For more documentation consult module's man page.

HPX

HPX 0.9.10 is available as a module on SuperMIC:

  • hpx/0.9.10/impi-4.1.3.048-intel64 is HPX 0.9.10 compiled with Intel 14.0.2 and Intel MPI 4.1.3
  • hpx/0.9.10/impi-4.1.3.048-intel64-mic is a Xeon Phi build HPX 0.9.10 compiled with Intel 14.0.2 and Intel MPI 4.1.3
  • hpx/0.9.10/mvapich2-2.0-INTEL-14.0.2 is HPX 0.9.10 compiled with Intel 14.0.2 and MVAPICH 2 2.0

Compilers

  • Intel, PGI and GCC compilers are available on SuperMIC, with Intel 14.0.2 being the default compiler.

  • Intel 14 is available as INTEL/14.0.2. It is loaded by default.

  • Intel 15 is available as INTEL/15.0.0.

  • GCC 4.9.0 is available as gcc/4.9.0.

Boost libraries

  • Boost 1.55 can be loaded with: module load boost/1.55.0/INTEL-14.0.2.
  • If Boost compiled with a different compiler, different configuration, or another version of Boost is needed users need to download and compile it themselves. @brycelelbach's script used to work for many users.

MPI Libraries

  • MVAPICH 2.0 is available as INTEL-140-MVAPICH2/2.0
  • Intel MPI 4.1.3 is available as impi/4.1.3.048/intel64.
  • Intel MPI 5.0.1.035 is available as impi/5.0.1.035/intel64.
  • MPICH 3.0.3 is available as mpich/3.0.3/INTEL-14.0.2.
  • MPICH 3.1.1 is available as mpich/3.1.1/INTEL-14.0.2 or INTEL-140-MPICH/3.1.1.
  • OpenMPI 1.8.4 is available as openmpi/1.8.4/INTEL-14.0.2.

Other HPX prerequisite libraries

  • hwloc 1.10.0 is available as hwloc/1.10.0/INTEL-14.0.2
  • HDF 5 1.8.12 is available as hdf5/1.8.12/INTEL-140-MVAPICH2-2.0.
  • libunwind, jemalloc, gperftools are not currently available as modules and have to be downloaded and compiled from their respective developers web pages.

Other software

Debugging Tools

  • Valgrind 3.9.0 is available as valgrind/3.9.0/GCC-4.9.0.
  • DDT 4.2.1 is available as load ddt/4.2.1.
  • TotalView 8.12.1 is available as totalview/8.12.1.

Compilation

For information about the compilation process take a look at the HPX Manual and Build recipes in HPX Documentation.

Running

Interactive Shells

To get an interactive development shell on one of the nodes you can issue the following command:

$ qsub -A <allocation-id> -I -q <desired-queue> -l nodes=<number-of-nodes>:ppn=20 -l walltime=wall-time

Where allocation-id is your allocation name, number-of-nodes is the number of nodes you would like, -q desired-partition is to specify the partition you would want to use, and wall time is the maximum session time in HH:MM:SS format. ppn=20 cannot be changed, because each node on SuperMIC has 20 cores and the cluster's policy is that you have to specify that number to allocate the whole node. After the shell has been acquired, you can run your HPX application. By default, it uses all available cores. Note that if you requested one node, you don't need to do mpirun or pbsdsh.

Scheduling Batch Jobs

The above mentioned method of running HPX applications is fine for development purposes. The disadvantage that comes with interactive sessions is that it only returns once the application is finished. This might not be appropriate for longer running applications (for example benchmarks or larger scale simulations). In order to cope with that limitation we use batch jobs.

For a batch job you need to have a script that it can run once the requested resources are available. In order to request resources you need to add #PBS comments in your script or provide the necessary parameters to qsub directly. The commands you need to execute are the same you would need to start your application as if you were in an interactive shell.

Example batch script

The following example script runs hello_world on 2 nodes and schedules it in the workq queue.

example.pbs:

#!/bin/bash

#PBS -q workq
#PBS -l nodes=2:ppn=20
#PBS -l walltime=00:05:00
#PBS -o example.out
#PBS -e example.err
#PBS -j oe
#PBS -N ExampleJob

uniq $PBS_NODEFILE >actual.nodes
unset PBS_NODEFILE

mpirun -f actual.nodes ./build/bin/hello_world

To schedule the script, run the following:

qsub example.pbs

Running HPX applications with the TCP parcelport

Running TCP HPX applications on SuperMIC can be done by using the pbsdsh command.

Note

pbsdsh does not pass some important environment variables (such as LD_LIBRARY_PATH) to the application. Wrapping the execution in a setup script that prepares the environment is one solution to this problem. One such script looks like this:

#!/bin/bash
# File Name: env.sh
export LD_LIBRARY_PATH="/usr/local/packages/mvapich2/2.0/INTEL->14.0.2/lib:/usr/local/compilers/Intel/cluster_studio_xe_2013.1.046/composer_xe_2013_sp1.2.144/compiler/lib/intel64:/usr/local/compilers/Intel/cluster_studio_xe_2013.1.046/composer_xe_2013_sp1.2.144/mkl/lib/intel64"

"$@"

This script needs to have execution permission (Can be done with chmod +x <setup-script>).

To run a TCP application, the following command may be used:

$ pbsdsh -u <setup-script> <hpx-application> --hpx:nodes=$(cat $PBS_NODEFILE) --hpx:endnodes <hpx-application-arguments>

Where <setup-script> is the absolute path to the setup script, <hpx-application> is the application, <hpx-application-arguments> contains the arguments that are passed to the application.

Example

$ pbsdsh -u $HOME/hpx/env.sh $HOME/hpx/tcp/bin/hello_world --hpx:nodes=$(cat $PBS_NODEFILE) --hpx:endnodes

Running HPX applications with the MPI parcelport

When run on PBS, HPX determines which nodes it is being run on by opening the file $PBS_NODEFILE points to examining its contents. However, SuperMIC makes this file available only to the PBS session that is running the job; meaning that not all HPX instances may be able to access it. In case you ran into this issue, we have two solutions for it:

Solution 1: Make the node file visible

Put the list of nodes in <node_file>

$ uniq $PBS_NODEFILE ><node_file>
$ unset PBS_NODEFILE
$ export HPX_NODEFILE=<node_file>

Use mpirun to run the HPX application and pass actual.nodes as the file containing the list of nodes.

$ mpirun -f $HPX_NODEFILE <hpx-application>

Example

$ uniq $PBS_NODEFILE >actual.nodes
$ unset PBS_NODEFILE
$ mpirun -f actual.nodes ./build/bin/hello_world

Solution 2: Directly provide the list of nodes

The following command can be used to run HPX applications with the MPI parcelport:

mpirun_rsh -ssh -np $PBS_NUM_NODES $(uniq $PBS_NODEFILE) <hpx-application>

Example

$ mpirun_rsh -ssh -np $PBS_NUM_NODES $(uniq $PBS_NODEFILE) ./build/bin/hello_world

Queues

The following queues are available on SuperMIC:

Queue Walltime (hh:mm:ss) Nodes Max Allocation Allowed Comment
workq 72:00:00 128 128 Regular Queue. Nodes have 2×Xeon Phi 7120P
checkpt 72:00:00 200 160 Nodes have 2×Xeon Phi 7120P
hybrid 72:00:00 8 - Nodes have 1×Xeon Phi 7120P and 1×NVIDIA Tesla K20X. Not available through XSEDE.
priority 168:00:00 128 - Does not seem to be available for regular users

Viewing Queue Job Information

Use qstat to check the status of a job. This returns a status report including CPU, memory, and wall-time usage of all jobs that are either queued or running.

To view from a particular user: qstat -u <user-name> e.g. To view your own jobs: qstat -u $USER .

To see queue information: qstat -q.

If -f flag is used qstat will show resources used, with statistics aggregated across the nodes the job is running on. -a flag will show wall-time in hours and minutes.

Checking Resource Utilization

Use qshow to check resource utilization on nodes allocated to a job.

Example

$ qshow 11111
PBS job: 11111, nodes: 20
Hostname  Days Load CPU U# (User:Process:VirtualMemory:Memory:Hours)
smic001      9 0.11 609 26 parsa:1d_stencil_8:12.3G:11G parsa:pbs_demux:13M:1M parsa:mpirun:95M:7M parsa:hydra_pmi_proxy:94M:7M
smic002      9 0.06 493  4 parsa:1d_stencil_8:10.9G:10G parsa:hydra_pmi_proxy:96M:7M
smic003      9 1.45 490  4 parsa:1d_stencil_8:11.8G:11G parsa:hydra_pmi_proxy:96M:7M
smic004      9 0.00 482  4 parsa:1d_stencil_8:12.0G:11G parsa:hydra_pmi_proxy:96M:7M
smic005      9 0.00 489  4 parsa:1d_stencil_8:12.2G:11G parsa:hydra_pmi_proxy:96M:7M
smic006      9 1.30 490  4 parsa:1d_stencil_8:11.4G:11G parsa:hydra_pmi_proxy:96M:7M
smic007      9 1.27 490  4 parsa:1d_stencil_8:11.8G:11G parsa:hydra_pmi_proxy:96M:7M
smic008      9 3.07 169  4 parsa:1d_stencil_8:7.3G:6.8G parsa:hydra_pmi_proxy:96M:7M
smic009      9 1.44 509  4 parsa:1d_stencil_8:11.4G:10G parsa:hydra_pmi_proxy:96M:7M
smic010      9 1.37 481  4 parsa:1d_stencil_8:11.4G:10G parsa:hydra_pmi_proxy:96M:7M
smic011      9 1.47 485  4 parsa:1d_stencil_8:11.9G:11G parsa:hydra_pmi_proxy:96M:7M
smic012      9 1.26 489  4 parsa:1d_stencil_8:12.5G:12G parsa:hydra_pmi_proxy:96M:7M
smic013      9 1.30 479  4 parsa:1d_stencil_8:11.6G:11G parsa:hydra_pmi_proxy:96M:7M
smic014      9 1.25 486  4 parsa:1d_stencil_8:10.8G:10G parsa:hydra_pmi_proxy:96M:7M
smic015      9 1.15 493  4 parsa:1d_stencil_8:10.5G:10G parsa:hydra_pmi_proxy:96M:7M
smic016      9 0.40 485  4 parsa:1d_stencil_8:11.7G:11G parsa:hydra_pmi_proxy:96M:7M
smic017      9 1.19 473  4 parsa:1d_stencil_8:11.4G:11G parsa:hydra_pmi_proxy:96M:7M
smic018      9 0.00 457  4 parsa:1d_stencil_8:11.4G:10G parsa:hydra_pmi_proxy:96M:7M
smic019      9 0.00 480  4 parsa:1d_stencil_8:11.3G:10G parsa:hydra_pmi_proxy:96M:7M
smic020      9 1.09 480  4 parsa:1d_stencil_8:12.2G:11G parsa:hydra_pmi_proxy:96M:7M
PBS_job=11111 user=parsa allocation=hpc_supermic01 queue=workq total_load=19.18 cpu_hours=0.46 wall_hours=0.06 unused_nodes=0 total_nodes=20 ppn=20 avg_load=0.95 avg_cpu=475% avg_mem=10760mb avg_vmem=12171mb top_proc=parsa:1d_stencil_8:smic001:12.3G:11G:0.0hr:607% toppm=cchukw1:test38:smic001:229M:140M node_processes=4

Alter a Queued Job

If your job has not started yet you can edit some attributes of your job until it starts using the qalter command. This is useful when SuperMIC is busy and your job will lose its' place in the queue if you cancel and enqueue another one. The script you passed as the argument however, cannot be changed with this command. The syntax is:

$ qalter [options ...] jobid

For instance, if you want to change the wall-time limit on job 11111 to 5 hours:

$ qalter -l walltime=5:00:00 11111

Estimate Job Start Time

When you submit a job it will be queued and depending on the current status of the queues your job might spend some time in the queue until the resources are granted to it. showstart <job-id> will give you a rough estimate which could be completely off. For instance if it shows exactly the midnight two or three days later in the future it's meaningless.

Example

$ showstart 11111

Cancelling a job

To cancel a job: qdel <job-id> where <job-id> is the id of the task.

For more information about job scheduling, take a look at the How to Use HPX Applications with PBS in HPX Documentation or visit http://www.hpc.lsu.edu/docs/pbs.php.

Clone this wiki locally