ROCM build fixes (#403)

Co-authored-by: root <[email protected]>
huggingface · Nov 5, 2024 · cef41a4 · cef41a4
1 parent 80259c9
commit cef41a4
Show file tree

Hide file tree

Showing 3 changed files with 13 additions and 11 deletions.
diff --git a/.github/workflows/build_rocm.yaml b/.github/workflows/build_rocm.yaml
@@ -95,7 +95,7 @@
          uses: docker/build-push-action@v4
          with:
            context: .
-           file: Dockerfile-cuda
+           file: Dockerfile-rocm
            push: ${{ github.event_name != 'pull_request' }}
            platforms: 'linux/amd64'
            build-args: |
@@ -130,7 +130,7 @@
          with:
            context: .
            target: grpc
-           file: Dockerfile-cuda
+           file: Dockerfile-rocm
            push: ${{ github.event_name != 'pull_request' }}
            platforms: 'linux/amd64'
            build-args: |

diff --git a/Dockerfile-rocm b/Dockerfile-rocm
@@ -81,8 +81,9 @@ RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-ins
     hipsparse-dev \
     hipblas-dev \
     hipblaslt-dev \
-    rocblas-dev \
     hiprand-dev \
+    hipsolver-dev \
+    rocblas-dev \
     rocrand-dev \
     && rm -rf /var/lib/apt/lists/*
 
@@ -105,17 +106,17 @@ RUN chmod +x ~/mambaforge.sh && \
 # Install flash-attention, torch dependencies
 RUN pip install numpy einops ninja --no-cache-dir
 
-RUN pip install torch --index-url https://download.pytorch.org/whl/rocm6.0
-
-ARG DEFAULT_USE_FLASH_ATTENTION=True
-COPY backends/python/Makefile-flash-att-v2 Makefile-flash-att-v2
-RUN make -f Makefile-flash-att-v2 install-flash-attention-v2-rocm
-
 # Install python backend
 COPY backends/python/server /tei_backends/python/server
 COPY backends/proto tei_backends/proto
 RUN make -C /tei_backends/python/server install
 
+RUN pip install --force-reinstall torch==$PYTORCH_VERSION --index-url https://download.pytorch.org/whl/rocm6.0
+
+ARG DEFAULT_USE_FLASH_ATTENTION=True
+COPY backends/python/Makefile-flash-att-v2 Makefile-flash-att-v2
+RUN make -f Makefile-flash-att-v2 install-flash-attention-v2-rocm
+
 ENV HUGGINGFACE_HUB_CACHE=/data \
     PORT=80 \
     USE_FLASH_ATTENTION=$DEFAULT_USE_FLASH_ATTENTION

diff --git a/docs/source/en/local_amd_gpu.md b/docs/source/en/local_amd_gpu.md
@@ -18,16 +18,17 @@ rendered properly in your Markdown viewer.
 
 Text-Embeddings-Inference supports the [AMD GPUs officially supporting ROCm](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/reference/system-requirements.html), including AMD Instinct MI210, MI250, MI300 and some of the AMD Radeon series GPUs.
 
-To leverage AMD GPUs, Text-Embeddings-Inference relies on its Python backend, and not on the [candle](https://github.com/huggingface/candle) backend that is used for CPU, Nvidia GPUs and Metal. The support in the python backend is more limited (Bert embeddings) but easily extendible. We welcome contributions to extend the supported models.
+To leverage AMD GPUs, Text-Embeddings-Inference relies on its Python backend, and not on the [candle](https://github.com/huggingface/candle) backend that is used for CPU, Nvidia GPUs and Metal. The support in the python backend is more limited (Bert embeddings) but easily extensible. We welcome contributions to extend the supported models.
 
 ## Usage through docker
 
 Using docker is the recommended approach.
 
 ```bash
+DOCKER_TAG=rocm-xxx # Specify the tag of the docker image to use
 docker run --rm -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --net host \
     --device=/dev/kfd --device=/dev/dri --group-add video --ipc=host --shm-size 32g \
-    ghcr.io/huggingface/text-embeddings-inference:rocm-1.2.4 \
+    ghcr.io/huggingface/text-embeddings-inference:$DOCKER_TAG \
     --model-id sentence-transformers/all-MiniLM-L6-v2
 ```