Skip to content

Latest commit

 

History

History
75 lines (62 loc) · 2.6 KB

README.md

File metadata and controls

75 lines (62 loc) · 2.6 KB

longest-job

A quick hack tool to show when busy cluster nodes could become free (i.e. when the last running Slurm job on each node exits).

By default, reports such high-water mark jobs on every cluster node that has at least one active job on it (calls squeue --states=R under the hood). A handful of familiar squeue options can be used to further limit reporting to certain nodes (by partition, by account, by QoS, etc). Alternatively, explicit Slurm-style nodelists can be passed as arguments.

Usage:

    longest-job [-h|--help] [OPTIONS] [nodelist(s)]

General options:

  -t, --time     Sort output by job end time (default is to sort by node name).
  -v, --verbose  Be verbose (print more job information for each job).
      --quiet    Be really quiet (suppress header and non-essential output).
  -V, --version  Print program version and exit.
  -h, --help     Display this help message and exit.

Recognized squeue-style filtering options (passed to the underlying squeue call verbatim, see man squeue for further details):

  -A, --account=<account_list>
  -j, --jobs=<job_id_list>
  -L, --licenses=<license_list>
  -M, --clusters=<clusters_list>
  -n, --name=<name_list>
  -p, --partition=<part_list>
  -q, --qos=<qos_list>
  -R, --reservation=<reservation_name>
  -u, --user=<user_list>
  -w, --nodelist=<hostlist>

In true squeue fashion, multiple filters are AND-ed. If both -w hostlist flag and explicit node names ($1...$n) are given, explicit ones win.

As a special case, nodelists in $1...$n can be absolute paths. Cue this excerpt from man scontrol:

scontrol show hostlist can also take the absolute pathname of a file (beginning with the character '/') containing a list of hostnames.

Note: if a node does not have active running jobs (e.g. is idle or offlined), no output is generated for it (even if the node has been specified explicitly). Because "no job" means "no longest job either".

Example

By node in a given partition:

$ longest-job --partition bell-b | head -3
NODE          JobID         END_TIME
bell-b000     3640405       2021-06-26T11:52:00
bell-b001     3606031       2021-06-25T21:48:04
bell-b002     3598003       2021-06-26T06:34:30

Or by earliest time:

$ longest-job --partition bell-b -t | head -3
NODE          JobID         END_TIME
bell-b001     3606031       2021-06-25T21:48:04
bell-b004     3606035       2021-06-25T21:52:04
bell-b002     3598003       2021-06-26T06:34:30

Author:

Lev Gorenstein [email protected], Purdue University Research Computing, 2021.

Contribute: https://github.com/lgorenstein/longest-job