Skip to content

Commit

Permalink
fix: docs
Browse files Browse the repository at this point in the history
  • Loading branch information
OlivierDehaene committed Feb 29, 2024
1 parent ca99fa3 commit 7b85d8c
Show file tree
Hide file tree
Showing 4 changed files with 21 additions and 21 deletions.
22 changes: 11 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,7 +97,7 @@ model=BAAI/bge-large-en-v1.5
revision=refs/pr/5
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model --revision $revision
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1 --model-id $model --revision $revision
```

And then you can make requests like
Expand Down Expand Up @@ -245,13 +245,13 @@ Text Embeddings Inference ships with multiple Docker images that you can use to

| Architecture | Image |
|-------------------------------------|-------------------------------------------------------------------------|
| CPU | ghcr.io/huggingface/text-embeddings-inference:cpu-1.0 |
| CPU | ghcr.io/huggingface/text-embeddings-inference:cpu-1.1 |
| Volta | NOT SUPPORTED |
| Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:turing-1.0 (experimental) |
| Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:1.0 |
| Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-1.0 |
| Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-1.0 |
| Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference:hopper-1.0 (experimental) |
| Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:turing-1.1 (experimental) |
| Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:1.1 |
| Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-1.1 |
| Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-1.1 |
| Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference:hopper-1.1 (experimental) |

**Warning**: Flash Attention is turned off by default for the Turing image as it suffers from precision issues.
You can turn Flash Attention v1 ON by using the `USE_FLASH_ATTENTION=True` environment variable.
Expand Down Expand Up @@ -280,7 +280,7 @@ model=<your private model>
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run
token=<your cli READ token>

docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model
docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1 --model-id $model
```

### Using Re-rankers models
Expand All @@ -298,7 +298,7 @@ model=BAAI/bge-reranker-large
revision=refs/pr/4
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model --revision $revision
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1 --model-id $model --revision $revision
```

And then you can rank the similarity between a query and a list of texts with:
Expand All @@ -318,7 +318,7 @@ You can also use classic Sequence Classification models like `SamLowe/roberta-ba
model=SamLowe/roberta-base-go_emotions
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1 --model-id $model
```

Once you have deployed the model you can use the `predict` endpoint to get the emotions most associated with an input:
Expand Down Expand Up @@ -347,7 +347,7 @@ model=BAAI/bge-large-en-v1.5
revision=refs/pr/5
volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run

docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0-grpc --model-id $model --revision $revision
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1-grpc --model-id $model --revision $revision
```

```shell
Expand Down
2 changes: 1 addition & 1 deletion docs/source/en/private_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,5 +37,5 @@ model=<your private model>
volume=$PWD/data
token=<your cli Hugging Face Hub token>

docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model
docker run --gpus all -e HUGGING_FACE_HUB_TOKEN=$token -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1 --model-id $model
```
6 changes: 3 additions & 3 deletions docs/source/en/quick_tour.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ model=BAAI/bge-large-en-v1.5
revision=refs/pr/5
volume=$PWD/data

docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model --revision $revision
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1 --model-id $model --revision $revision
```

<Tip>
Expand Down Expand Up @@ -69,7 +69,7 @@ model=BAAI/bge-reranker-large
revision=refs/pr/4
volume=$PWD/data

docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model --revision $revision
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1 --model-id $model --revision $revision
```

Once you have deployed a model you can use the `rerank` endpoint to rank the similarity between a query and a list
Expand All @@ -90,7 +90,7 @@ You can also use classic Sequence Classification models like `SamLowe/roberta-ba
model=SamLowe/roberta-base-go_emotions
volume=$PWD/data

docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.0 --model-id $model
docker run --gpus all -p 8080:80 -v $volume:/data --pull always ghcr.io/huggingface/text-embeddings-inference:1.1 --model-id $model
```

Once you have deployed the model you can use the `predict` endpoint to get the emotions most associated with an input:
Expand Down
12 changes: 6 additions & 6 deletions docs/source/en/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,13 +63,13 @@ Find the appropriate Docker image for your hardware in the following table:

| Architecture | Image |
|-------------------------------------|--------------------------------------------------------------------------|
| CPU | ghcr.io/huggingface/text-embeddings-inference:cpu-1.0 |
| CPU | ghcr.io/huggingface/text-embeddings-inference:cpu-1.1 |
| Volta | NOT SUPPORTED |
| Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:turing-1.0 (experimental) |
| Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:1.0 |
| Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-1.0 |
| Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-1.0 |
| Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference:hopper-1.0 (experimental) |
| Turing (T4, RTX 2000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:turing-1.1 (experimental) |
| Ampere 80 (A100, A30) | ghcr.io/huggingface/text-embeddings-inference:1.1 |
| Ampere 86 (A10, A40, ...) | ghcr.io/huggingface/text-embeddings-inference:86-1.1 |
| Ada Lovelace (RTX 4000 series, ...) | ghcr.io/huggingface/text-embeddings-inference:89-1.1 |
| Hopper (H100) | ghcr.io/huggingface/text-embeddings-inference:hopper-1.1 (experimental) |

**Warning**: Flash Attention is turned off by default for the Turing image as it suffers from precision issues.
You can turn Flash Attention v1 ON by using the `USE_FLASH_ATTENTION=True` environment variable.

0 comments on commit 7b85d8c

Please sign in to comment.