-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Requesting a Python/CUDA example #18
Comments
According to the doc, yes it is. Unfortunately I can't make the container job example work. The base container image is downloaded, the nvidia drivers are installed but the command is not executed and the job exits with error. GPU count can be set with this syntax. |
There is now a dogs vs. cats CNN training example here, which uses PyTorch and acceleration via CUDA. GPU jobs (containerized or not) are supported in general, and Batch will automatically install drivers (when the installGpuDrivers flag is set in the job spec) and will automatically set the necessary docker options to give containers access to the GPU(s) for container runnables. I've tested the new sample (this one) recently so it should be a reliable template for other PyTorch/CUDA jobs. Please let me know if you run into any issues. |
Hi,
The existing examples are very good. But given that the GPU/AI/ML features were highlighted in the introductory blog post ("Use accelerator-optimized resources."), it would be nice to see a full example here.
If it helps, I've tried this on my own, but got some errors:
The log output is:
And for reference, here's the info for my instance template:
EDIT: Digging through the
Job
spec to theComputeResource
spec, I see the following:Does this imply GPU jobs are not yet supported?
The text was updated successfully, but these errors were encountered: