v1.14.0: LCMs, SpeechT5, Falcon, Mistral, decoder refactorization
ONNX
New architectures
Falcon
SpeechT5
Mistral
- Add Mistral models ONNX export support by @echarlaix in #1425
TrOCR
LCMs
Enable LCMs (available in in diffusers
since v0.22.0
) ONNX export and ORT inference by @echarlaix in #1469
from optimum.onnxruntime import ORTLatentConsistencyModelPipeline
pipe = ORTLatentConsistencyModelPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7", export=True)
prompt = "sailing ship in storm by Leonardo da Vinci"
images = pipe(prompt=prompt, num_inference_steps=4, guidance_scale=8.0).images
Also enable ONNX export using the CLI :
optimum-cli export onnx --model SimianLuo/LCM_Dreamshaper_v7 lcm_onnx/
Decoder refactorization
- Add position ids as input during ONNX export by @fxmarty in #1381
- Enable the export of only one decoder for decoder-only models by @echarlaix in #1257
GPTQ
- Enable possibility to choose exllamav2 kernels for GPTQ models by @SunMarc in #1419
- Disable exllamav2 for quantization by @SunMarc in #1482
- Default to exllama when exllamav2 is disabled by @SunMarc in #1494
- Added cache_block_outputs parameter to handle models with non-regular structure such as ChatGLM by @AlexKoff88 in #1479
- Add support for CPU Inference by @vivekkhandelwal1 in #1496
- Fix minimum version of auto-gptq by @fxmarty in #1504
- switch to exllama_config instead of disabling exllamav2 by @SunMarc in #1505
Other changes and bugfixes
- Fix wrong dtype in the ONNX export by @fxmarty in #1369
- Add support for loading quantization from config by @aarnphm #1363
- Guard multiprocessing set start method by @fxmarty in #1377
- Do not output KV cache when not using
with-past
in the ONNX export by @fxmarty in #1358 - Fix provider availability check on ORT 1.16.0 release by @fxmarty in #1403
- Fix quantization for onnxruntime v1.16.0 by @echarlaix in #1405
- Fix normalized config key for models architecture by @echarlaix in #1408
- Fix arg in bettertransformer llama attention by @SunMarc in #1421
- Ignore .xml files for Stable Diffusion ORT downloads by @baskrahmer in #1428
- Falcon BetterTransformer requires transformers>=4.34 by @fxmarty in #1431
- Fix llama ONNX export by @fxmarty in #1432
- Update attention.py by @DongHande in #1416
- Remove SharedDDP as it was deprecated from Transformers by @AdamLouly in #1443
- Fix owlvit task detection by @fxmarty in #1453
- Improve ONNX quantization doc by @fxmarty in #1451
- Fix perceiver tests and dummy inputs for ONNX by @baskrahmer in #1449
- Disable bart onnx export for text-classification and question-answering by @fxmarty in #1457
- Fix ONNX exporter library_name by @baskrahmer in #1460
- [ORT Training] Some important updates of ONNX Runtime training APIs by @JingyaHuang in #1335
- Fix typo in BetterTransformer CLIP by @fxmarty in #1468
- Fix custom architecture detection in onnx export by @fxmarty in #1472
- Fix whisper export by @mht-sharma in #1503
- Update Transformers dependency for Habana extra by @regisss in #1508
- Fix argument error by @ranchlai in #1501
- Remove attention mask patching by @fxmarty in #1509
- Fix generation input by @echarlaix in #1512
- Fix tests ORTModel by @fxmarty in #1517
- Fix BT on transformers 4.35 release by @fxmarty in #1518
New Contributors
- @aarnphm made their first contribution in #1363
- @DongHande made their first contribution in #1416
- @AlexKoff88 made their first contribution in #1479
- @vivekkhandelwal1 made their first contribution in #1496
- @ranchlai made their first contribution in #1501