Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] post-release polishes no1 #27643

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ What's new

* New models supported: Llama 3.2 (1B & 3B), Gemma 2 (2B & 9B), and YOLO11.
* LLM support on NPU: Llama 3 8B, Llama 2 7B, Mistral-v0.2-7B, Qwen2-7B-Instruct and Phi-3
Mini-Instruct.
Mini-Instruct.
* Noteworthy notebooks added: Sam2, Llama3.2, Llama3.2 - Vision, Wav2Lip, Whisper, and Llava.
* Preview: support for Flax, a high-performance Python neural network library based on JAX.
Its modular design allows for easy customization and accelerated inference on GPUs.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ CPU
* Ubuntu 20.04 long-term support (LTS), 64-bit (Kernel 5.15+)
* macOS 12.6 and above, 64-bit and ARM64
* CentOS 7
* Red Hat Enterprise Linux 9.3-9.4, 64-bit
* Red Hat Enterprise Linux (RHEL) 8 and 9, 64-bit
* openSUSE Tumbleweed, 64-bit and ARM64
* Ubuntu 20.04 ARM64

Expand Down Expand Up @@ -65,7 +65,7 @@ GPU
* Ubuntu 22.04 long-term support (LTS), 64-bit
* Ubuntu 20.04 long-term support (LTS), 64-bit
* CentOS 7
* Red Hat Enterprise Linux 9.3-9.4, 64-bit
* Red Hat Enterprise Linux (RHEL) 8 and 9, 64-bit

.. tab-item:: Additional considerations

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,21 +20,22 @@ Install required dependencies:
pip install nncf==2.12 onnx==1.16.1 optimum-intel==1.19.0
pip install openvino==2024.5 openvino-tokenizers==2024.5 openvino-genai==2024.5

NOTE that for systems based on Intel® Core Ultra Processors Series 2 and 16 GB of RAM,
prompts longer then 1024 characters will not work with a model of 7B or more parameters,
Note that for systems based on Intel® Core Ultra Processors Series 2, more than 16GB of RAM
may be required to run prompts over 1024 tokens on models exceeding 7B parameters,
such as Llama-2-7B, Mistral-0.2-7B, and Qwen-2-7B.

Export an LLM model via Hugging Face Optimum-Intel
##################################################

Since **symmetrically-quantized 4-bit (INT4) models are preffered for inference on NPU**, make sure to export
the model with the proper conversion and optimization settings.
Since **symmetrically-quantized 4-bit (INT4) models are preffered for inference on NPU**, make
sure to export the model with the proper conversion and optimization settings.

| You may export LLMs via Optimum-Intel, using one of two compression methods:
| **group quantization** - for both smaller and larger models,
| **channel-wise quantization** - remarkably effective but for models exceeding 1 billion parameters.

You select one of the methods by setting the ``--group-size`` parameter to either ``128`` or ``-1``, respectively. See the following examples:
You select one of the methods by setting the ``--group-size`` parameter to either ``128`` or
``-1``, respectively. See the following examples:

.. tab-set::

Expand Down
Loading