[feature request] Support Intel GPU #1094

houyushan · 2022-10-28T10:00:16Z

hello , Use Intel GPU（XPU )

my product needs to use the singularity/apptainer to use the Intel XPU (the latest Intel GPU for AI training). Is there a command like -- nv to support it at present, or will future versions provide similar commands? When will it be?

dtrudg · 2022-10-31T09:52:28Z

Hi @houyushan - we'll have to look into this a little more. It appears from looking at relevant Intel container images that the intention is to have libraries in the container, and make the /dev/dri devices available in the container.

This means that you should be able to run without any additional options, unless you are using --contain / --containall, in which case you will have to add -B /dev/dri to ensure the devices are available.

houyushan · 2022-10-31T10:16:54Z

okay, thank you,

I will also continue to research and test

elezar · 2023-02-20T10:10:21Z

Just a note: If we assume CDI support in the OCI mode, using a CDI spec generated for the Intel devices would allow injection of these.

See #813

dtrudg · 2023-02-20T10:10:55Z

@elezar - yep, thanks. This was a hope at the back of my mind :-)

pzehner · 2024-03-12T18:05:36Z

Hello, any news on this issue?

Using singularity CE version 4.0.2 with an Intel GPU Max 1550, I don't have access to the GPU, even if the card is listed in /dev/dri.

dtrudg · 2024-03-13T10:02:53Z

As mentioned in a comment above, Singularity's OCI-mode supports CDI (Container Device Interface) configuration for access to GPUs, which would includ Intel GPUs if a CDI configuration is available.

With regard to adding a direct Intel GPU flag for the default native, non-OCI, mode... generally adding this kind of hardware specific support into SingularityCE is dependent on either:

The vendor, or a user, contributing the functionality as a pull request that they will also be able to assist with maintaining.
The vendor, or a 3rd party, providing us (as a project) with access to the relevant GPU hardware on an ongoing basis so that we can develop and maintain the requested functionality.

NVIDIA GPU support comes under (2), as we have had signficant contributions from NVIDIA, and it is also trivial to access Tesla GPUs at reasonable cost via public cloud providers.

What we wish to avoid, when adding Intel GPU support, is the situation we find ourselves in with AMD GPUs / ROCm. The lack of access to data center AMD GPUs (capable of running latest ROCm) in the cloud, or by other means, makes maintaining ROCm support difficult / costly.

If you are able to, we would suggest that you indicate to Intel that support integrated into SingularityCE is important to you.

Without access to hardware, the minimum information required for us to add an experimental flag, without commitment that it will be well maintained, would be:

A comprehensive list of /dev entries required to use GPU functionality.
A comprehensive list of any libraries and binaries that need to be present in a container to use GPU functionality.

elezar · 2024-03-14T13:52:50Z

I would strongly recommend following the CDI route here instead of relying on vendor-specific logic in Singularity. If effort is to be spent, I would recommend adding (experiemental) CDI support to the native mode of singularity (see #1395) if the support is required there instead.

@kad do you have any visibility on the generation of CDI specification for Intel devices?

kad · 2024-03-14T14:03:58Z

I don't, but @byako and @tkatila will be good candidates to chime in here.

pzehner · 2024-03-19T10:58:35Z

I checked the OCI way and CDI, but I cannot access the GPU out of the box. I guess I should indicate a CDI file with --device. The documentation states that usual lookup directories are /etc/cdi and /var/run/cdi, but none of them exist. I tried to guess intel.com/gpu=all, but it was obviously incorrect.

It would be nice to have more documentation about this.

byako · 2024-03-19T11:13:04Z

The CDI specs are generated automatically at the moment only by the kubelet-plugin part of the DRA resource-driver.

If you don't need the dynamic creation of the specs - it's possible to create the specs manually, they are quite simple.

There is a chance however, that they will need to be fixed after reboot if you have multiple GPUs that are different, or if you have integrated GPU that also gets enabled in DRM, because the DRM devices' indexes are not persistent across reboots. For instance, /dev/dri/card0 can become card1, and card1 might become card0.

byako · 2024-03-19T11:21:09Z

Here's an example of CDI spec:
sudo cat /etc/cdi/intel.com-gpu.yaml

cdiVersion: 0.5.0
containerEdits: {}
devices:
- containerEdits:
    deviceNodes:
    - path: /dev/dri/card1
      type: c
    - path: /dev/dri/renderD129
      type: c
  name: 0000:03:00.0-0x56a0
- containerEdits:
    deviceNodes:
    - path: /dev/dri/card0
      type: c
    - path: /dev/dri/renderD128
      type: c
  name: 0000:00:02.0-0x4680
kind: intel.com/gpu

the name field can be somewhat arbitrary albeit with spelling restrictions, if you just create /etc/cdi folder and paste the contents of the above snippet into file inside that folder, it should work, given that your runtime supports CDI.

sudo mkdir /etc/cdi
sudo vim /etc/cdi/mygpus.yaml

then --device intel.com/gpu=0000:03:00.0-0x56a0

pzehner · 2024-03-19T11:29:11Z

I see, is there a way to get these configuration files without writing them by hand? When I googled "intel gpu container device interface," I couldn't find anything like that. How is the user supposed to know this?

byako · 2024-03-19T12:40:46Z

Hello, any news on this issue?

Using singularity CE version 4.0.2 with an Intel GPU Max 1550, I don't have access to the GPU, even if the card is listed in /dev/dri.

Could you please add more details about this case: what was the command line you used with what options?

pzehner · 2024-03-19T15:19:02Z

My bad, I missed one of your answers. Hum, I'm not sure to understand this line:

The CDI specs are generated automatically at the moment only by the kubelet-plugin part of the DRA resource-driver.

Should I install Kubernetes as well? Noob question here.

Could you please add more details about this case: what was the command line you used with what options?

In my case, I have a machine with four Intel GPU Max 1550, and I want to run code withing an Intel OneAPI image. For the demonstration, I just use sycl-ls to list the SYCL-compatible devices (note that I'm not using the manual CDI file yet):

$ singularity run --oci docker://intel/oneapi-basekit:2024.0.1-devel-ubuntu20.04 sycl-ls   
Getting image source signatures
Copying blob 521f275cc58b done   | 
Copying blob 565c40052dc3 done   | 
Copying blob afcec6bc5983 done   | 
Copying blob 93b1720de081 done   | 
Copying blob bcd9c7c8e2dd done   | 
Copying blob 3c86603e9f04 done   | 
Copying blob 45a1c23aa4e7 done   | 
Copying config ba41f6c638 done   | 
Writing manifest to image destination
INFO:    Converting OCI image to OCI-SIF format
INFO:    Squashing image to single layer
INFO:    Writing OCI-SIF image
INFO:    Cleaning up.
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Xeon(R) CPU Max 9460 OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]

As you can see, only the CPU is detected. This is what I should see:

[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Xeon(R) CPU Max 9460 OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:gpu:2] Intel(R) OpenCL Graphics, Intel(R) Data Center GPU Max 1550 OpenCL 3.0 NEO  [23.22.26516.34]
[opencl:gpu:3] Intel(R) OpenCL Graphics, Intel(R) Data Center GPU Max 1550 OpenCL 3.0 NEO  [23.22.26516.34]
[opencl:gpu:4] Intel(R) OpenCL Graphics, Intel(R) Data Center GPU Max 1550 OpenCL 3.0 NEO  [23.22.26516.34]
[opencl:gpu:5] Intel(R) OpenCL Graphics, Intel(R) Data Center GPU Max 1550 OpenCL 3.0 NEO  [23.22.26516.34]
[ext_oneapi_level_zero:gpu:0] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.26516]
[ext_oneapi_level_zero:gpu:1] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.26516]
[ext_oneapi_level_zero:gpu:2] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.26516]
[ext_oneapi_level_zero:gpu:3] Intel(R) Level-Zero, Intel(R) Data Center GPU Max 1550 1.3 [1.3.26516]

byako · 2024-03-19T16:36:33Z

There is no need to install Kubernetes, I meant that the automated generation of the CDI specs at the moment available only in K8s.

Once you created the /etc/cdi dir and saved the yaml file into it, devices described in that yaml can be used by singularity.

You have to use the --device parameter in the command like I mentioned above - that will tell singularity to use the device that it finds in the CDI spec. https://docs.sylabs.io/guides/latest/user-guide/oci_runtime.html#sec-cdi.

The yaml file I quoted above is just an example. Check what is the DRM index of the GPU, for instance: ls -al /dev/dri/by-path/, and see which /dev/dri/cardX is linked to the Max 1550. You can see which PCI device Max 1550 is by running 'lspci | grep Display. When you know which /dev/dri/cardXis Max 1550, use that in the/etc/cdi/mygpus.yaml. renderdnode is not needed for Max1550, onlycardX`.

We'll work on finding the way to generate CDI specs or at least documenting it.

pzehner · 2024-03-20T09:24:45Z

Ok, I see. I think it would be nice to have a better way to generate these CDI specs. The logics from the Kubernetes plugin could be extracted.

If I'm not wrong, you can completely deduce them from the structure in /dev/dri, right?

pzehner · 2024-03-21T13:14:13Z

So, I tried with the example CDI specs file that I adapted for my hardware, but the GPU is still not visible from within the container:

$ singularity run --oci --device intel.com/gpu=0000:29:00.0 docker://intel/oneapi-basekit:2024.0.1-devel-ubuntu20.04 sycl-ls 
INFO:    Using cached OCI-SIF image
[opencl:acc:0] Intel(R) FPGA Emulation Platform for OpenCL(TM), Intel(R) FPGA Emulation Device OpenCL 1.2  [2023.16.12.0.12_195853.xmain-hotfix]
[opencl:cpu:1] Intel(R) OpenCL, Intel(R) Xeon(R) CPU Max 9460 OpenCL 3.0 (Build 0) [2023.16.12.0.12_195853.xmain-hotfix]

Where the CDI specs look like:

cdiVersion: 0.5.0
containerEdits: {}
devices:
- containerEdits:
    deviceNodes:
    - path: /dev/dri/card1
      type: c
    - path: /dev/dri/renderD128
      type: c
  name: 0000:29:00.0
...
kind: intel.com/gpu

tkatila · 2024-03-21T13:26:18Z

@pzehner can you check whether /dev/dri/ has card and renderD devices? If they are, it might be an access rights issue with the actual devices.

pzehner · 2024-03-21T13:32:20Z

Yes, I have the correct devices listed in /dev/dri, and I can access them outside of the container.

tkatila · 2024-03-21T13:55:15Z

roger, I downloaded the same image and tried it within docker. sycl-ls didn't list GPUs for me either. I'll try to understand what is with it.

tkatila · 2024-03-21T19:13:22Z

I don't exactly know why sycl-ls doesn't detect the GPUs. What I did notice is that 2024.0.1-devel-ubuntu22.04 version does detect them. Comparing the images didn't reveal anything obvious nor could I make the 20.04 variant functional by installing packages.

I'd use the 22.04 variant as a workaround, if that suites you.

pzehner · 2024-04-04T09:14:54Z

I think using an up-to-date image is acceptable.

dtrudg · 2024-06-14T08:34:14Z

Closing this issue. CDI support is available in --oci mode, and appears to work with the correct image.

Support for Intel GPUs in native mode would come via #1395 - however this is not on the development roadmap firmly at this time.

houyushan added the enhancement New feature or request label Oct 28, 2022

This was referenced Mar 1, 2023

Add CDI --device support to --oci mode #1394

Closed

Consider adding CDI --device support to the singularity native runtime #1395

Open

tkatila mentioned this issue Mar 21, 2024

sycl-ls for gpus fails on oneapi-basekit 20.04 intel/oneapi-containers#66

Closed

dtrudg closed this as completed Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feature request] Support Intel GPU #1094

[feature request] Support Intel GPU #1094

houyushan commented Oct 28, 2022

dtrudg commented Oct 31, 2022

houyushan commented Oct 31, 2022

elezar commented Feb 20, 2023

dtrudg commented Feb 20, 2023

pzehner commented Mar 12, 2024

dtrudg commented Mar 13, 2024 •

edited

Loading

elezar commented Mar 14, 2024 •

edited

Loading

kad commented Mar 14, 2024

pzehner commented Mar 19, 2024 •

edited

Loading

byako commented Mar 19, 2024

byako commented Mar 19, 2024 •

edited

Loading

pzehner commented Mar 19, 2024 •

edited

Loading

byako commented Mar 19, 2024 •

edited

Loading

pzehner commented Mar 19, 2024 •

edited

Loading

byako commented Mar 19, 2024

pzehner commented Mar 20, 2024

pzehner commented Mar 21, 2024 •

edited

Loading

tkatila commented Mar 21, 2024

pzehner commented Mar 21, 2024 •

edited

Loading

tkatila commented Mar 21, 2024

tkatila commented Mar 21, 2024

pzehner commented Apr 4, 2024

dtrudg commented Jun 14, 2024

[feature request] Support Intel GPU #1094

[feature request] Support Intel GPU #1094

Comments

houyushan commented Oct 28, 2022

dtrudg commented Oct 31, 2022

houyushan commented Oct 31, 2022

elezar commented Feb 20, 2023

dtrudg commented Feb 20, 2023

pzehner commented Mar 12, 2024

dtrudg commented Mar 13, 2024 • edited Loading

elezar commented Mar 14, 2024 • edited Loading

kad commented Mar 14, 2024

pzehner commented Mar 19, 2024 • edited Loading

byako commented Mar 19, 2024

byako commented Mar 19, 2024 • edited Loading

pzehner commented Mar 19, 2024 • edited Loading

byako commented Mar 19, 2024 • edited Loading

pzehner commented Mar 19, 2024 • edited Loading

byako commented Mar 19, 2024

pzehner commented Mar 20, 2024

pzehner commented Mar 21, 2024 • edited Loading

tkatila commented Mar 21, 2024

pzehner commented Mar 21, 2024 • edited Loading

tkatila commented Mar 21, 2024

tkatila commented Mar 21, 2024

pzehner commented Apr 4, 2024

dtrudg commented Jun 14, 2024

dtrudg commented Mar 13, 2024 •

edited

Loading

elezar commented Mar 14, 2024 •

edited

Loading

pzehner commented Mar 19, 2024 •

edited

Loading

byako commented Mar 19, 2024 •

edited

Loading

pzehner commented Mar 19, 2024 •

edited

Loading

byako commented Mar 19, 2024 •

edited

Loading

pzehner commented Mar 19, 2024 •

edited

Loading

pzehner commented Mar 21, 2024 •

edited

Loading

pzehner commented Mar 21, 2024 •

edited

Loading