Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Node Usage - GPU metrics #60

Open
niknoronha opened this issue Aug 25, 2021 · 2 comments
Open

Node Usage - GPU metrics #60

niknoronha opened this issue Aug 25, 2021 · 2 comments

Comments

@niknoronha
Copy link

Could we get GPU stats added to the node usage metrics. similar to the CPU stats that have been added in 18.0?

@lahwaacz
Copy link
Contributor

lahwaacz commented Sep 1, 2021

There is an open PR for this: #57

@martialblog
Copy link
Contributor

martialblog commented Jun 14, 2022

I think this needs some refactoring due to the newly introduced -gpus-acct parameter.

I've been looking at it and not sure if the code should be in node.go, would make some sense, since all other per-Node data are there - however - it might be better to add it to gpus.go, since it gets activated with the -gpus-acct parameter.

Here some output from a multi node cluster:

sinfo -h -N -O "NodeList: ,Gres: ,GresUsed:"

gpu-01 gpu:tesla:4 gpu:tesla:4(IDX:0-3)
gpu-02 gpu:tesla:4 gpu:tesla:4(IDX:0-3)
gpu-03 gpu:tesla:4 gpu:tesla:4(IDX:0-3)
gpu-04 gpu:tesla:4 gpu:tesla:3(IDX:0,2-3)
gpu-05 gpu:tesla:4 gpu:tesla:0(IDX:N/A)
gpu-06 gpu:tesla:4 gpu:tesla:1(IDX:3)
gpu-07 gpu:tesla:4 gpu:tesla:2(IDX:1-2)
gpu-08 gpu:tesla:4 gpu:tesla:4(IDX:0-3)
cpu-01 (null) gpu:0
cpu-02 (null) gpu:0
cpu-03 (null) gpu:0
cpu-04 (null) gpu:0

Or maybe even a new node_gpus.go?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants