Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for allocations on several different job managers #695

Open
npf opened this issue Apr 19, 2024 · 1 comment
Open

Support for allocations on several different job managers #695

npf opened this issue Apr 19, 2024 · 1 comment

Comments

@npf
Copy link

npf commented Apr 19, 2024

Hi,

In https://it4innovations.github.io/hyperqueue/stable/deployment/allocation/, the documentation says :

You can create multiple allocation queues, and you can even combine PBS queues with Slurm queues.

Does that mean hyperqueue can auto-allocate on several HPC clusters with different submission frontends?

Technically speaking, I do not see how to configure the necessary remote access to submission frontends. In the code, the allocation function calls either sbatch or qsub directly, if I'm correct.

Should both sbatch or qsub commands be available on the machine where the hyerqueue server runs?

Thanks.

@Kobzol
Copy link
Collaborator

Kobzol commented Apr 19, 2024

Hi!

Does that mean hyperqueue can auto-allocate on several HPC clusters with different submission frontends?

It does, although to actually support two different clusters, you'll need to run the HQ server on a place that is accessible from both clusters (and their compute nodes) through TCP/IP, which might be a bit challenging. Also, if you want to use automatic allocation for this, it's a bit more complex (see below).

Technically speaking, I do not see how to configure the necessary remote access to submission frontends. In the code, the allocation function calls either sbatch or qsub directly, if I'm correct.

It does indeed call sbatch/qsub directly. We have been thinking about providing some way to customize this mechanism, but we haven't seen any use-case for that yet. A simpler solution/workaround might be to provide a proxy, that will reroute the sbatch/qsub calls from the node where HQ server is deployed to the corresponding login nodes/frontends. You could probably write e.g. a simple Python program that will act as sbatch/qsub and allow communicating with remote systems.

If you had a use-case for this, we could also implement e.g. a JSON-based auto allocation backend, which could implement the autoallocation using any mechanism it would need.

Should both sbatch or qsub commands be available on the machine where the hyerqueue server runs?

Currently, yes, if you want to use auto-allocation (or you can use a proxy as described above).

If you don't use automatic allocation, you can also just provide the computational resources to HQ manually, by running sbatch/qsub on the corresponding clusters, and then redirecting the HQ workers to the IP address of the HQ server. In that case the server does not need to know anything about sbatch/qsub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants