From d882bb8e803dd549b79827e97818fd651f8074e8 Mon Sep 17 00:00:00 2001 From: Karol Blaszczak Date: Tue, 26 Nov 2024 10:12:42 +0100 Subject: [PATCH] Docs post release polishes port mstr (#27739) port: https://github.com/openvinotoolkit/openvino/pull/27643 https://github.com/openvinotoolkit/openvino/pull/27682 https://github.com/openvinotoolkit/openvino/pull/27684 --- .../about-openvino/release-notes-openvino.rst | 6 +++--- .../release-notes-openvino/system-requirements.rst | 4 ++-- .../llm_inference_guide/genai-guide-npu.rst | 11 ++++++----- 3 files changed, 11 insertions(+), 10 deletions(-) diff --git a/docs/articles_en/about-openvino/release-notes-openvino.rst b/docs/articles_en/about-openvino/release-notes-openvino.rst index 343c9e780f05dc..9e7673d7d0910d 100644 --- a/docs/articles_en/about-openvino/release-notes-openvino.rst +++ b/docs/articles_en/about-openvino/release-notes-openvino.rst @@ -32,7 +32,7 @@ What's new * New models supported: Llama 3.2 (1B & 3B), Gemma 2 (2B & 9B), and YOLO11. * LLM support on NPU: Llama 3 8B, Llama 2 7B, Mistral-v0.2-7B, Qwen2-7B-Instruct and Phi-3 - Mini-Instruct. + Mini-Instruct. * Noteworthy notebooks added: Sam2, Llama3.2, Llama3.2 - Vision, Wav2Lip, Whisper, and Llava. * Preview: support for Flax, a high-performance Python neural network library based on JAX. Its modular design allows for easy customization and accelerated inference on GPUs. @@ -87,8 +87,8 @@ Common * A new constant constructor has been added, enabling constants to be created from data pointer as shared memory. Additionally, it can take ownership of a shared, or other, object, avoiding a two-step process to wrap memory into ``ov::Tensor``. -* Files are now read via the async ReadFile API, reducing the bottleneck for LLM model load - times on GPU. +* Asynchronous file reading with mmap library has been implemented, reducing loading times for + model files, especially for LLMs. * CPU implementation of SliceScatter operator is now available, used for models such as Gemma, supporting increased LLM performance. diff --git a/docs/articles_en/about-openvino/release-notes-openvino/system-requirements.rst b/docs/articles_en/about-openvino/release-notes-openvino/system-requirements.rst index a12cacf8402953..79a9f63821c16f 100644 --- a/docs/articles_en/about-openvino/release-notes-openvino/system-requirements.rst +++ b/docs/articles_en/about-openvino/release-notes-openvino/system-requirements.rst @@ -37,7 +37,7 @@ CPU * Ubuntu 20.04 long-term support (LTS), 64-bit (Kernel 5.15+) * macOS 12.6 and above, 64-bit and ARM64 * CentOS 7 - * Red Hat Enterprise Linux 9.3-9.4, 64-bit + * Red Hat Enterprise Linux (RHEL) 8 and 9, 64-bit * openSUSE Tumbleweed, 64-bit and ARM64 * Ubuntu 20.04 ARM64 @@ -65,7 +65,7 @@ GPU * Ubuntu 22.04 long-term support (LTS), 64-bit * Ubuntu 20.04 long-term support (LTS), 64-bit * CentOS 7 - * Red Hat Enterprise Linux 9.3-9.4, 64-bit + * Red Hat Enterprise Linux (RHEL) 8 and 9, 64-bit .. tab-item:: Additional considerations diff --git a/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst b/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst index 5a641300a68edb..41e5cbb5733c58 100644 --- a/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst +++ b/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide-npu.rst @@ -20,21 +20,22 @@ Install required dependencies: pip install nncf==2.12 onnx==1.16.1 optimum-intel==1.19.0 pip install --pre openvino openvino-tokenizers openvino-genai --extra-index-url https://storage.openvinotoolkit.org/simple/wheels/nightly -NOTE that for systems based on Intel® Core Ultra Processors Series 2 and 16 GB of RAM, -prompts longer then 1024 characters will not work with a model of 7B or more parameters, +Note that for systems based on Intel® Core™ Ultra Processors Series 2, more than 16GB of RAM +may be required to run prompts over 1024 tokens on models exceeding 7B parameters, such as Llama-2-7B, Mistral-0.2-7B, and Qwen-2-7B. Export an LLM model via Hugging Face Optimum-Intel ################################################## -Since **symmetrically-quantized 4-bit (INT4) models are preffered for inference on NPU**, make sure to export -the model with the proper conversion and optimization settings. +Since **symmetrically-quantized 4-bit (INT4) models are preffered for inference on NPU**, make +sure to export the model with the proper conversion and optimization settings. | You may export LLMs via Optimum-Intel, using one of two compression methods: | **group quantization** - for both smaller and larger models, | **channel-wise quantization** - remarkably effective but for models exceeding 1 billion parameters. -You select one of the methods by setting the ``--group-size`` parameter to either ``128`` or ``-1``, respectively. See the following examples: +You select one of the methods by setting the ``--group-size`` parameter to either ``128`` or +``-1``, respectively. See the following examples: .. tab-set::