Releases: huggingface/text-embeddings-inference
Releases · huggingface/text-embeddings-inference
v1.5.1
What's Changed
- Download
model.onnx_data
by @kozistr in #343 - Rename 'Sentence Transformers' to 'sentence-transformers' in docstrings by @Wauplin in #342
- fix: add serde default for truncation direction by @drbh in #399
- fix: metrics unbounded memory by @OlivierDehaene in #409
- Fix to allow health check w/o auth by @kozistr in #360
- Update
ort
crate version to2.0.0-rc.4
to support onnx IR version 10 by @kozistr in #361 - adds curl to fix healthcheck by @WissamAntoun in #376
- fix: use num_cpus::get to check as get_physical does not check cgroups by @OlivierDehaene in #410
- fix: use status code 400 when batch is empty by @OlivierDehaene in #413
- fix: add cls pooling as default for BERT variants by @OlivierDehaene in #426
- feat: auto limit string if truncate is set by @OlivierDehaene in #428
New Contributors
- @Wauplin made their first contribution in #342
- @XciD made their first contribution in #345
- @WissamAntoun made their first contribution in #376
Full Changelog: v1.5.0...v1.5.1
v1.5.0
Notable Changes
- ONNX runtime for CPU deployments: greatly improve CPU deployment throughput
- Add
/similarity
route
What's Changed
- tokenizer max limit on input size by @ErikKaum in #324
- docs: air-gapped deployments by @OlivierDehaene in #326
- feat(onnx): add onnx runtime for better CPU perf by @OlivierDehaene in #328
- feat: add
/similarity
route by @OlivierDehaene in #331 - fix(ort): fix mean pooling by @OlivierDehaene in #332
- chore(candle): update flash attn by @OlivierDehaene in #335
- v1.5.0 by @OlivierDehaene in #336
New Contributors
Full Changelog: v1.4.0...v1.5.0
v1.4.0
Notable Changes
- Cuda support for the Qwen2 model architecture
What's Changed
- feat(candle): support Qwen2 on Cuda by @OlivierDehaene in #316
- fix(candle): fix last token pooling
Full Changelog: v1.3.0...v1.4.0
v1.3.0
Notable changes
- New truncation direction parameter
- Cuda support for JinaCode model architecture
- Cuda support for Mistral model architecture
- Cuda support for Alibaba GTE model architecture
- New prompt name parameter: you can now add a prompt name to the body of your request to add a pre-prompt to your input, based on the Sentence Transformers configuration. You can also set a default prompt / prompt name to always add a pre-prompt to your requests.
What's Changed
- Ci migration to K8s by @glegendre01 in #269
- chore: map compute_cap from GPU name by @haixiw in #276
- chore: cover Nvidia T4/L4 GPU by @haixiw in #284
- feat(ci): add trufflehog secrets detection by @McPatate in #286
- Community contribution code of conduct by @LysandreJik in #291
- Update README.md by @michaelfeil in #277
- Upgrade tokenizers to 0.19.1 to deal with breaking change in tokenizers by @scriptator in #266
- Add env for OTLP service name by @kozistr in #285
- Fix CI build timeout by @fxmarty in #296
- fix(router): payload limit was not correctly applied by @OlivierDehaene in #298
- feat(candle): better cuda error by @OlivierDehaene in #300
- feat(router): add truncation direction parameter by @OlivierDehaene in #299
- Support for Jina Code model by @patricebechard in #292
- feat(router): add base64 encoding_format for OpenAI API by @OlivierDehaene in #301
- fix(candle): fix FlashJinaCodeModel by @OlivierDehaene in #302
- fix: use malloc_trim to cleanup pages by @OlivierDehaene in #307
- feat(candle): add FlashMistral by @OlivierDehaene in #308
- feat(candle): add flash gte by @OlivierDehaene in #310
- feat: add default prompts by @OlivierDehaene in #312
- Add optional CORS allow any option value in http server cli by @kir-gadjello in #260
- Update
HUGGING_FACE_HUB_TOKEN
toHF_API_TOKEN
in README by @kevinhu in #263 - v1.3.0 by @OlivierDehaene in #313
New Contributors
- @haixiw made their first contribution in #276
- @McPatate made their first contribution in #286
- @LysandreJik made their first contribution in #291
- @michaelfeil made their first contribution in #277
- @scriptator made their first contribution in #266
- @fxmarty made their first contribution in #296
- @patricebechard made their first contribution in #292
- @kir-gadjello made their first contribution in #260
- @kevinhu made their first contribution in #263
Full Changelog: v1.2.3...v1.3.0
v1.2.3
What's Changed
- fix limit peak memory to build cuda-all docker image by @OlivierDehaene in #246
Full Changelog: v1.2.2...v1.2.3
v1.2.2
What's Changed
- fix(gke): accept null values for vertex env vars by @OlivierDehaene in #243
- fix: fix cpu image to not default on the sagemaker entrypoint
Full Changelog: v1.2.1...v1.2.2
v1.2.1
TEI is now Apache 2.0!
What's Changed
- Document how to send batched inputs by @osanseviero in #222
- feat: add auto-truncate arg by @OlivierDehaene in #224
- feat: add PredictPair to proto by @OlivierDehaene in #225
- fix: fix auto_truncate for openai by @OlivierDehaene in #228
- Change license to Apache 2.0 by @OlivierDehaene in #231
- feat: Amazon SageMaker compatible images by @JGalego in #103
- fix(CI): fix build all by @OlivierDehaene in #236
- fix: fix cuda-all image by @OlivierDehaene in #239
- Add SageMaker CPU images and validate by @philschmid in #240
New Contributors
- @osanseviero made their first contribution in #222
- @JGalego made their first contribution in #103
- @philschmid made their first contribution in #240
Full Changelog: v1.2.0...v1.2.1
v1.2.0
What's Changed
- add cuda all image to facilitate deployment by @OlivierDehaene in #186
- add splade pooling to Bert by @OlivierDehaene in #187
- support vertex api endpoint by @drbh in #184
- readme examples by @plaggy in #180
- add_pooling_layer for bert classification by @OlivierDehaene in #190
- add /embed_sparse route by @OlivierDehaene in #191
- Applying
Cargo.toml
optimization options by @somehowchris in #201 - Add Dockerfile-arm64 to allow docker builds on Apple M1/M2 architecture by @iandoe in #209
- configurable payload limit by @OlivierDehaene in #210
- add api_key for request authorization by @OlivierDehaene in #211
- add all methods to vertex API by @OlivierDehaene in #192
- add
/decode
route by @OlivierDehaene in #212 - Input Types Compatibility with OpenAI's API (#112) by @OlivierDehaene in #214
New Contributors
- @drbh made their first contribution in #184
- @plaggy made their first contribution in #180
- @somehowchris made their first contribution in #201
- @iandoe made their first contribution in #209
Full Changelog: v1.1.0...v1.2.0
v1.1.0
Highlights
- Splade pooling
What's Changed
- Update Dockerfile to install curl by @jpbalarini in #117
- fix loading of bert classification models by @OlivierDehaene in #173
- splade pooling by @OlivierDehaene in #174
New Contributors
- @jpbalarini made their first contribution in #117
Full Changelog: v1.0.0...v.1.1.0
v1.0.0
Highlights
- Support for Nomic models
- Support for Flash Attention for Jina models
- Metal backend for M* users
/tokenize
route to directly access the internal TEI tokenizer/embed_all
route to allow client level pooling
What's Changed
- fix: limit the number of buckets for prom metrics by @OlivierDehaene in #114
- feat: support flash attention for Jina by @OlivierDehaene in #119
- feat: add support for Metal by @OlivierDehaene in #120
- fix: fix turing for Jina and limit concurrency in docker build by @OlivierDehaene in #121
- fix(router): fix panics on partial_cmp and empty req.texts by @OlivierDehaene in #138
- feat(router): add /tokenize route by @OlivierDehaene in #139
- feat(backend): support classification for bert by @OlivierDehaene in #155
- feat: add embed_raw route to get all embeddings without pooling by @OlivierDehaene in #154
- added docs for
OTLP_ENDPOINT
around the defaults and format sent by @MarcusDunn in #157 - fix: use mimalloc to solve memory "leak" by @OlivierDehaene in #161
- fix: remove modif of tokenizer by @OlivierDehaene in #163
- fix: add cors_allow_origin to cli by @OlivierDehaene in #162
- fix: use st max_seq_length by @OlivierDehaene in #167
- feat: support nomic models by @OlivierDehaene in #166
New Contributors
- @MarcusDunn made their first contribution in #157
Full Changelog: v0.6.0...v1.0.0