This is a quick start for using the rwth cluster. Most of the details are summrized from RWTH-Docs, HPC-WIKI or are from emails from the IT-Center.
This has been explained really well here
Select a node from here and use the following for login ssh login:
ssh -l userid login-g.hpc.itc.rwth-aachen.de
Imagine the system like this: After logging in via ssh you have a Linux Pc environment. You can use this space to load your data and install the needed programs like conda. You can't run anything for too long in this environment since it's for testing. This environment will be shared across all login nodes. You can use your job-script.sh to access data and programs on this environment and also to save logs or resaults in this envoirment. Later on the data can be transfered with FTP to your PC.
There are two modes for using the GPU-Clusters:
-
For using non-thrivial amount of compute time using GPUs you are obliged to use the batch system see here.
-
There is also a possiblity to use GPUs for interactive tests, see here. Note that these tests must be SHORT and not too many. The two interacite front ends are equipped with GPUIs are for testing only and not for production jobs of any extent.
The limitation of both mods are written here
'Small use' is available with the free quota, see here and here. Please start with this. If you would need more ressources so file in a project proposal. Remember that only some 5% of our nodes are equipped with GPUs. So If your code is able to run well on CPUs, avoid use of GPUs! It is only profitable to use GPUs if it make the excution (significantly!) faster on a full node withour GPUs.
It is possible to switch the cuda versions as written here. For that its important to read and understand the module system Essential background: loading a module does not change any installation by itself, but it make some software available for use by changing environment variables.
- Show available cuda versions:
module avail cuda
- load a specific cuda version. here cuda11
module load cuda11
- show the location of the loaded cuda
echo $CUDA_ROOT
Since conda will be installed in $Home, there wont be any problem installing it for your user. the '--user' can used to stay in userspace.
after writing the job.sh file. You can apply this job with:
sbatch job.sh
Seeing the status of your jobs:
squeue -u userID
See predicted starting time of your jobs:
squeue -u userID --start
get the live update from the log file:
tail -f /path/logfile.log
get the 100 last lines from the log file:
tail -n 100 /path/logfile.log
Tow good job script examples are written here:
https://github.com/SamSweere/Covid19-News-Analysis/wiki/RWTH-GPU-Cluster-stuff https://github.com/pfackeldey/Machine-Learning/wiki/GPU-Batch-System
https://help.itc.rwth-aachen.de/service/rhr4fjjutttf/article/1774cf956ee34bafb2cade94cdef0949
Its also possible to use FTP with your prefered programm like FileZilla to transfer files from your PC to the clusters or vice versa. https://help.itc.rwth-aachen.de/service/rhr4fjjutttf/article/db3e5fd39d1d42c9815b4fa689719ac9/
[email protected]
port: 22