Skip to content

v1.14.0: LCMs, SpeechT5, Falcon, Mistral, decoder refactorization

Compare
Choose a tag to compare
@echarlaix echarlaix released this 07 Nov 13:54
· 287 commits to main since this release

ONNX

New architectures

Falcon

SpeechT5

Mistral

TrOCR

LCMs

Enable LCMs (available in in diffusers since v0.22.0) ONNX export and ORT inference by @echarlaix in #1469

from optimum.onnxruntime import ORTLatentConsistencyModelPipeline

pipe = ORTLatentConsistencyModelPipeline.from_pretrained("SimianLuo/LCM_Dreamshaper_v7", export=True)
prompt = "sailing ship in storm by Leonardo da Vinci"
images = pipe(prompt=prompt, num_inference_steps=4, guidance_scale=8.0).images

Also enable ONNX export using the CLI :

optimum-cli export onnx --model SimianLuo/LCM_Dreamshaper_v7 lcm_onnx/

Decoder refactorization

  • Add position ids as input during ONNX export by @fxmarty in #1381
  • Enable the export of only one decoder for decoder-only models by @echarlaix in #1257

GPTQ

  • Enable possibility to choose exllamav2 kernels for GPTQ models by @SunMarc in #1419
  • Disable exllamav2 for quantization by @SunMarc in #1482
  • Default to exllama when exllamav2 is disabled by @SunMarc in #1494
  • Added cache_block_outputs parameter to handle models with non-regular structure such as ChatGLM by @AlexKoff88 in #1479
  • Add support for CPU Inference by @vivekkhandelwal1 in #1496
  • Fix minimum version of auto-gptq by @fxmarty in #1504
  • switch to exllama_config instead of disabling exllamav2 by @SunMarc in #1505

Other changes and bugfixes

New Contributors