-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature request] Support Intel GPU #1094
Comments
Hi @houyushan - we'll have to look into this a little more. It appears from looking at relevant Intel container images that the intention is to have libraries in the container, and make the This means that you should be able to run without any additional options, unless you are using |
okay, thank you, I will also continue to research and test |
Just a note: If we assume CDI support in the OCI mode, using a CDI spec generated for the Intel devices would allow injection of these. See #813 |
@elezar - yep, thanks. This was a hope at the back of my mind :-) |
Hello, any news on this issue? Using singularity CE version 4.0.2 with an Intel GPU Max 1550, I don't have access to the GPU, even if the card is listed in |
As mentioned in a comment above, Singularity's OCI-mode supports CDI (Container Device Interface) configuration for access to GPUs, which would includ Intel GPUs if a CDI configuration is available. With regard to adding a direct Intel GPU flag for the default native, non-OCI, mode... generally adding this kind of hardware specific support into SingularityCE is dependent on either:
NVIDIA GPU support comes under (2), as we have had signficant contributions from NVIDIA, and it is also trivial to access Tesla GPUs at reasonable cost via public cloud providers. What we wish to avoid, when adding Intel GPU support, is the situation we find ourselves in with AMD GPUs / ROCm. The lack of access to data center AMD GPUs (capable of running latest ROCm) in the cloud, or by other means, makes maintaining ROCm support difficult / costly. If you are able to, we would suggest that you indicate to Intel that support integrated into SingularityCE is important to you. Without access to hardware, the minimum information required for us to add an experimental flag, without commitment that it will be well maintained, would be:
|
I would strongly recommend following the CDI route here instead of relying on vendor-specific logic in Singularity. If effort is to be spent, I would recommend adding (experiemental) CDI support to the native mode of singularity (see #1395) if the support is required there instead. @kad do you have any visibility on the generation of CDI specification for Intel devices? |
I checked the OCI way and CDI, but I cannot access the GPU out of the box. I guess I should indicate a CDI file with It would be nice to have more documentation about this. |
The CDI specs are generated automatically at the moment only by the kubelet-plugin part of the DRA resource-driver. If you don't need the dynamic creation of the specs - it's possible to create the specs manually, they are quite simple. There is a chance however, that they will need to be fixed after reboot if you have multiple GPUs that are different, or if you have integrated GPU that also gets enabled in DRM, because the DRM devices' indexes are not persistent across reboots. For instance, /dev/dri/card0 can become card1, and card1 might become card0. |
Here's an example of CDI spec: cdiVersion: 0.5.0
containerEdits: {}
devices:
- containerEdits:
deviceNodes:
- path: /dev/dri/card1
type: c
- path: /dev/dri/renderD129
type: c
name: 0000:03:00.0-0x56a0
- containerEdits:
deviceNodes:
- path: /dev/dri/card0
type: c
- path: /dev/dri/renderD128
type: c
name: 0000:00:02.0-0x4680
kind: intel.com/gpu the name field can be somewhat arbitrary albeit with spelling restrictions, if you just create /etc/cdi folder and paste the contents of the above snippet into file inside that folder, it should work, given that your runtime supports CDI.
then |
I see, is there a way to get these configuration files without writing them by hand? When I googled "intel gpu container device interface," I couldn't find anything like that. How is the user supposed to know this? |
Could you please add more details about this case: what was the command line you used with what options? |
My bad, I missed one of your answers. Hum, I'm not sure to understand this line:
Should I install Kubernetes as well? Noob question here.
In my case, I have a machine with four Intel GPU Max 1550, and I want to run code withing an Intel OneAPI image. For the demonstration, I just use
As you can see, only the CPU is detected. This is what I should see:
|
There is no need to install Kubernetes, I meant that the automated generation of the CDI specs at the moment available only in K8s. Once you created the You have to use the The yaml file I quoted above is just an example. Check what is the DRM index of the GPU, for instance: We'll work on finding the way to generate CDI specs or at least documenting it. |
Ok, I see. I think it would be nice to have a better way to generate these CDI specs. The logics from the Kubernetes plugin could be extracted. If I'm not wrong, you can completely deduce them from the structure in |
So, I tried with the example CDI specs file that I adapted for my hardware, but the GPU is still not visible from within the container:
Where the CDI specs look like: cdiVersion: 0.5.0
containerEdits: {}
devices:
- containerEdits:
deviceNodes:
- path: /dev/dri/card1
type: c
- path: /dev/dri/renderD128
type: c
name: 0000:29:00.0
...
kind: intel.com/gpu |
@pzehner can you check whether |
Yes, I have the correct devices listed in |
roger, I downloaded the same image and tried it within docker. |
I don't exactly know why I'd use the 22.04 variant as a workaround, if that suites you. |
I think using an up-to-date image is acceptable. |
Closing this issue. CDI support is available in Support for Intel GPUs in native mode would come via #1395 - however this is not on the development roadmap firmly at this time. |
hello , Use Intel GPU(XPU )
my product needs to use the singularity/apptainer to use the Intel XPU (the latest Intel GPU for AI training). Is there a command like -- nv to support it at present, or will future versions provide similar commands? When will it be?
The text was updated successfully, but these errors were encountered: