-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--gpus flag #360
Comments
This is still worth pursuing prior to #361 (comment) The |
@dtrudg this looks like low hanging fruit so maybe I can help! Is this still desired, and if so, could you give a quick summary of what the implementation should do? E.g.,
And this
Should this be tackled after this first set of envars are added or at the same time? And if at the same time, could we chat about what that means? I'm not familiar with the current interaction with nvidia gpus! |
Hi @vsoch - it is, unfortunately, not as easy as it first seems. For the case where the experimental The catch is that |
I should say explicitly... if you'd like to take this on further... please reach out on Slack or similar and I can demonstrate some of the issues to you. I don't want to put you off completely here :-) |
@dtrudg from my perspective, I would like to argue against implementing a |
Describe the solution you'd like
The
--gpus
flag for the nvidia docker runtime will configure thenvidia-container-cli
setup so that e.g.is equivalent to
NVIDIA_VISIBLE_DEVICES=all
,NVIDIA_DRIVER_CAPABILITIES=utility
.It would be an advantage to be able to use
--gpus
rather than requiring the individual environment variables to be set. A matchingSINGULARITY_GPUS
env var would be appropriate.Note that with #361 we would read the NVIDIA_ env vars from the container instead of the host, so
--gpus
/SINGULARITY_GPUS
are required to override.Edit - as noted in discussion below, because we aren't yet defaulting to
-nvccli
, it wouldn't be very friendly for--gpus
not to apply to SingularityCE's own GPU setup. We need to handle device binding / masking in that case - but we could ignore the capabilities portion, and perhaps only support numeric GPU IDs, not MIG UUIDs etc.The text was updated successfully, but these errors were encountered: