Release v1.14.0: LCMs, SpeechT5, Falcon, Mistral, decoder refactorization · huggingface/optimum

ONNX

New architectures

Falcon

Add ONNX and ORT support for Falcon by @fxmarty in #1391

SpeechT5

SpeechT5 ONNX support by @fxmarty in #1404

Mistral

Add Mistral models ONNX export support by @echarlaix in #1425

TrOCR

Enable KV cache support by @fxmarty in #1456

LCMs

Enable LCMs (available in in diffusers since v0.22.0) ONNX export and ORT inference by @echarlaix in #1469

from optimum.onnxruntime import ORTLatentConsistencyModelPipeline

pipe = ORTLatentConsistencyModelPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7", export=True)
prompt = "sailing ship in storm by Leonardo da Vinci"
images = pipe(prompt=prompt, num_inference_steps=4, guidance_scale=8.0).images

Also enable ONNX export using the CLI :

optimum-cli export onnx --model SimianLuo/LCM_Dreamshaper_v7 lcm_onnx/

Decoder refactorization

Add position ids as input during ONNX export by @fxmarty in #1381
Enable the export of only one decoder for decoder-only models by @echarlaix in #1257

GPTQ

Enable possibility to choose exllamav2 kernels for GPTQ models by @SunMarc in #1419
Disable exllamav2 for quantization by @SunMarc in #1482
Default to exllama when exllamav2 is disabled by @SunMarc in #1494
Added cache_block_outputs parameter to handle models with non-regular structure such as ChatGLM by @AlexKoff88 in #1479
Add support for CPU Inference by @vivekkhandelwal1 in #1496
Fix minimum version of auto-gptq by @fxmarty in #1504
switch to exllama_config instead of disabling exllamav2 by @SunMarc in #1505

Other changes and bugfixes

Fix wrong dtype in the ONNX export by @fxmarty in #1369
Add support for loading quantization from config by @aarnphm #1363
Guard multiprocessing set start method by @fxmarty in #1377
Do not output KV cache when not using with-past in the ONNX export by @fxmarty in #1358
Fix provider availability check on ORT 1.16.0 release by @fxmarty in #1403
Fix quantization for onnxruntime v1.16.0 by @echarlaix in #1405
Fix normalized config key for models architecture by @echarlaix in #1408
Fix arg in bettertransformer llama attention by @SunMarc in #1421
Ignore .xml files for Stable Diffusion ORT downloads by @baskrahmer in #1428
Falcon BetterTransformer requires transformers>=4.34 by @fxmarty in #1431
Fix llama ONNX export by @fxmarty in #1432
Update attention.py by @DongHande in #1416
Remove SharedDDP as it was deprecated from Transformers by @AdamLouly in #1443
Fix owlvit task detection by @fxmarty in #1453
Improve ONNX quantization doc by @fxmarty in #1451
Fix perceiver tests and dummy inputs for ONNX by @baskrahmer in #1449
Disable bart onnx export for text-classification and question-answering by @fxmarty in #1457
Fix ONNX exporter library_name by @baskrahmer in #1460
[ORT Training] Some important updates of ONNX Runtime training APIs by @JingyaHuang in #1335
Fix typo in BetterTransformer CLIP by @fxmarty in #1468
Fix custom architecture detection in onnx export by @fxmarty in #1472
Fix whisper export by @mht-sharma in #1503
Update Transformers dependency for Habana extra by @regisss in #1508
Fix argument error by @ranchlai in #1501
Remove attention mask patching by @fxmarty in #1509
Fix generation input by @echarlaix in #1512
Fix tests ORTModel by @fxmarty in #1517
Fix BT on transformers 4.35 release by @fxmarty in #1518

New Contributors

@aarnphm made their first contribution in #1363
@DongHande made their first contribution in #1416
@AlexKoff88 made their first contribution in #1479
@vivekkhandelwal1 made their first contribution in #1496
@ranchlai made their first contribution in #1501

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v1.14.0: LCMs, SpeechT5, Falcon, Mistral, decoder refactorization

ONNX

New architectures

Falcon

SpeechT5

Mistral

TrOCR

LCMs

Decoder refactorization

GPTQ

Other changes and bugfixes

New Contributors

Contributors