openvinotoolkit · kblaszczak-intel · Nov 20, 2024 · Nov 20, 2024
@@ -32,7 +32,7 @@ What's new
 
   * New models supported: Llama 3.2 (1B & 3B), Gemma 2 (2B & 9B), and YOLO11.
   * LLM support on NPU: Llama 3 8B, Llama 2 7B, Mistral-v0.2-7B, Qwen2-7B-Instruct and Phi-3
-   Mini-Instruct.
+    Mini-Instruct.
   * Noteworthy notebooks added: Sam2, Llama3.2, Llama3.2 - Vision, Wav2Lip, Whisper, and Llava.
   * Preview: support for Flax, a high-performance Python neural network library based on JAX.
     Its modular design allows for easy customization and accelerated inference on GPUs.

@@ -37,7 +37,7 @@ CPU
       * Ubuntu 20.04 long-term support (LTS), 64-bit (Kernel 5.15+)
       * macOS 12.6 and above, 64-bit and ARM64
       * CentOS 7
-      * Red Hat Enterprise Linux 9.3-9.4, 64-bit
+      * Red Hat Enterprise Linux (RHEL) 8 and 9, 64-bit
       * openSUSE Tumbleweed, 64-bit and ARM64
       * Ubuntu 20.04 ARM64
 
@@ -65,7 +65,7 @@ GPU
       * Ubuntu 22.04 long-term support (LTS), 64-bit
       * Ubuntu 20.04 long-term support (LTS), 64-bit
       * CentOS 7
-      * Red Hat Enterprise Linux 9.3-9.4, 64-bit
+      * Red Hat Enterprise Linux (RHEL) 8 and 9, 64-bit
 
    .. tab-item:: Additional considerations
 

@@ -20,21 +20,22 @@ Install required dependencies:
    pip install nncf==2.12 onnx==1.16.1 optimum-intel==1.19.0
    pip install openvino==2024.5 openvino-tokenizers==2024.5 openvino-genai==2024.5
 
-NOTE  that for systems based on Intel® Core Ultra Processors Series 2 and 16 GB of RAM,
-prompts longer then 1024 characters will not work with a model of 7B or more parameters,
+Note that for systems based on Intel® Core™ Ultra Processors Series 2, more than 16GB of RAM
+may be required to run prompts over 1024 tokens on models exceeding 7B parameters,
 such as Llama-2-7B, Mistral-0.2-7B, and Qwen-2-7B.
 
 Export an LLM model via Hugging Face Optimum-Intel
 ##################################################
 
-Since **symmetrically-quantized 4-bit (INT4) models are preffered for inference on NPU**, make sure to export
-the model with the proper conversion and optimization settings.
+Since **symmetrically-quantized 4-bit (INT4) models are preffered for inference on NPU**, make
+sure to export the model with the proper conversion and optimization settings.
 
 | You may export LLMs via Optimum-Intel, using one of two compression methods:
 | **group quantization** - for both smaller and larger models,
 | **channel-wise quantization** - remarkably effective but for models exceeding 1 billion parameters.
 
-You select one of the methods by setting the ``--group-size`` parameter to either ``128`` or ``-1``, respectively. See the following examples:
+You select one of the methods by setting the ``--group-size`` parameter to either ``128`` or
+``-1``, respectively. See the following examples:
 
 .. tab-set::