Support for allocations on several different job managers #695

npf · 2024-04-19T10:30:08Z

Hi,

In https://it4innovations.github.io/hyperqueue/stable/deployment/allocation/, the documentation says :

You can create multiple allocation queues, and you can even combine PBS queues with Slurm queues.

Does that mean hyperqueue can auto-allocate on several HPC clusters with different submission frontends?

Technically speaking, I do not see how to configure the necessary remote access to submission frontends. In the code, the allocation function calls either sbatch or qsub directly, if I'm correct.

Should both sbatch or qsub commands be available on the machine where the hyerqueue server runs?

Thanks.

The text was updated successfully, but these errors were encountered:

Kobzol · 2024-04-19T10:34:40Z

Hi!

Does that mean hyperqueue can auto-allocate on several HPC clusters with different submission frontends?

It does, although to actually support two different clusters, you'll need to run the HQ server on a place that is accessible from both clusters (and their compute nodes) through TCP/IP, which might be a bit challenging. Also, if you want to use automatic allocation for this, it's a bit more complex (see below).

Technically speaking, I do not see how to configure the necessary remote access to submission frontends. In the code, the allocation function calls either sbatch or qsub directly, if I'm correct.

It does indeed call sbatch/qsub directly. We have been thinking about providing some way to customize this mechanism, but we haven't seen any use-case for that yet. A simpler solution/workaround might be to provide a proxy, that will reroute the sbatch/qsub calls from the node where HQ server is deployed to the corresponding login nodes/frontends. You could probably write e.g. a simple Python program that will act as sbatch/qsub and allow communicating with remote systems.

If you had a use-case for this, we could also implement e.g. a JSON-based auto allocation backend, which could implement the autoallocation using any mechanism it would need.

Should both sbatch or qsub commands be available on the machine where the hyerqueue server runs?

Currently, yes, if you want to use auto-allocation (or you can use a proxy as described above).

If you don't use automatic allocation, you can also just provide the computational resources to HQ manually, by running sbatch/qsub on the corresponding clusters, and then redirecting the HQ workers to the IP address of the HQ server. In that case the server does not need to know anything about sbatch/qsub.

Kobzol mentioned this issue Apr 19, 2024

Add note about remote PBS/Slurm allocation to docs #696

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for allocations on several different job managers #695

Support for allocations on several different job managers #695

npf commented Apr 19, 2024

Kobzol commented Apr 19, 2024 •

edited

Loading

Support for allocations on several different job managers #695

Support for allocations on several different job managers #695

Comments

npf commented Apr 19, 2024

Kobzol commented Apr 19, 2024 • edited Loading

Kobzol commented Apr 19, 2024 •

edited

Loading