Model Optimizer Changelog (Windows)

0.19 (2024-11-18)

New Features

This is the first official release of TensorRT Model Optimizer for Windows
ONNX INT4 Quantization: :meth:`modelopt.onnx.quantization.quantize_int4 <modelopt.onnx.quantization.int4.quantize>` now supports ONNX INT4 quantization for DirectML and TensorRT* deployment. See :ref:`Support_Matrix` for details about supported features and models.
LLM Quantization with Olive: Enabled LLM quantization through Olive, streamlining model optimization workflows. Refer example
DirectML Deployment Guide: Added DML deployment guide. Refer :ref:`DirectML_Deployment`.
MMLU Benchmark for Accuracy Evaluations: Introduced MMLU benchmarking for accuracy evaluation of ONNX models on DirectML (DML).
Published quantized ONNX models collection: Published quantized ONNX models at HuggingFace NVIDIA collections.

* This version includes experimental features such as TensorRT deployment of ONNX INT4 models, PyTorch quantization and sparsity. These are currently unverified on Windows.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHANGELOG-Windows.rst

CHANGELOG-Windows.rst

Model Optimizer Changelog (Windows)

0.19 (2024-11-18)

Files

CHANGELOG-Windows.rst

Latest commit

History

CHANGELOG-Windows.rst

File metadata and controls

Model Optimizer Changelog (Windows)

0.19 (2024-11-18)