Skip to content

Commit

Permalink
Docs post release polishes port mstr (#27739)
Browse files Browse the repository at this point in the history
  • Loading branch information
kblaszczak-intel authored Nov 26, 2024
1 parent c02e2ac commit d882bb8
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 10 deletions.
6 changes: 3 additions & 3 deletions docs/articles_en/about-openvino/release-notes-openvino.rst
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ What's new

* New models supported: Llama 3.2 (1B & 3B), Gemma 2 (2B & 9B), and YOLO11.
* LLM support on NPU: Llama 3 8B, Llama 2 7B, Mistral-v0.2-7B, Qwen2-7B-Instruct and Phi-3
Mini-Instruct.
Mini-Instruct.
* Noteworthy notebooks added: Sam2, Llama3.2, Llama3.2 - Vision, Wav2Lip, Whisper, and Llava.
* Preview: support for Flax, a high-performance Python neural network library based on JAX.
Its modular design allows for easy customization and accelerated inference on GPUs.
Expand Down Expand Up @@ -87,8 +87,8 @@ Common
* A new constant constructor has been added, enabling constants to be created from data pointer
as shared memory. Additionally, it can take ownership of a shared, or other, object, avoiding
a two-step process to wrap memory into ``ov::Tensor``.
* Files are now read via the async ReadFile API, reducing the bottleneck for LLM model load
times on GPU.
* Asynchronous file reading with mmap library has been implemented, reducing loading times for
model files, especially for LLMs.
* CPU implementation of SliceScatter operator is now available, used for models such as Gemma,
supporting increased LLM performance.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ CPU
* Ubuntu 20.04 long-term support (LTS), 64-bit (Kernel 5.15+)
* macOS 12.6 and above, 64-bit and ARM64
* CentOS 7
* Red Hat Enterprise Linux 9.3-9.4, 64-bit
* Red Hat Enterprise Linux (RHEL) 8 and 9, 64-bit
* openSUSE Tumbleweed, 64-bit and ARM64
* Ubuntu 20.04 ARM64

Expand Down Expand Up @@ -65,7 +65,7 @@ GPU
* Ubuntu 22.04 long-term support (LTS), 64-bit
* Ubuntu 20.04 long-term support (LTS), 64-bit
* CentOS 7
* Red Hat Enterprise Linux 9.3-9.4, 64-bit
* Red Hat Enterprise Linux (RHEL) 8 and 9, 64-bit

.. tab-item:: Additional considerations

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,21 +20,22 @@ Install required dependencies:
pip install nncf==2.12 onnx==1.16.1 optimum-intel==1.19.0
pip install --pre openvino openvino-tokenizers openvino-genai --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly
NOTE that for systems based on Intel® Core Ultra Processors Series 2 and 16 GB of RAM,
prompts longer then 1024 characters will not work with a model of 7B or more parameters,
Note that for systems based on Intel® Core Ultra Processors Series 2, more than 16GB of RAM
may be required to run prompts over 1024 tokens on models exceeding 7B parameters,
such as Llama-2-7B, Mistral-0.2-7B, and Qwen-2-7B.

Export an LLM model via Hugging Face Optimum-Intel
##################################################

Since **symmetrically-quantized 4-bit (INT4) models are preffered for inference on NPU**, make sure to export
the model with the proper conversion and optimization settings.
Since **symmetrically-quantized 4-bit (INT4) models are preffered for inference on NPU**, make
sure to export the model with the proper conversion and optimization settings.

| You may export LLMs via Optimum-Intel, using one of two compression methods:
| **group quantization** - for both smaller and larger models,
| **channel-wise quantization** - remarkably effective but for models exceeding 1 billion parameters.
You select one of the methods by setting the ``--group-size`` parameter to either ``128`` or ``-1``, respectively. See the following examples:
You select one of the methods by setting the ``--group-size`` parameter to either ``128`` or
``-1``, respectively. See the following examples:

.. tab-set::

Expand Down

0 comments on commit d882bb8

Please sign in to comment.