diff --git a/docs/articles_en/learn-openvino/llm_inference_guide.rst b/docs/articles_en/learn-openvino/llm_inference_guide.rst index 5846d1a484737c..840853bc7d998c 100644 --- a/docs/articles_en/learn-openvino/llm_inference_guide.rst +++ b/docs/articles_en/learn-openvino/llm_inference_guide.rst @@ -20,12 +20,12 @@ Generative AI workflow Generative AI is a specific area of Deep Learning models used for producing new and “original” data, based on input in the form of image, sound, or natural language text. Due to their complexity and size, generative AI pipelines are more difficult to deploy and run efficiently. -OpenVINO simplifies the process and ensures high-performance integrations, with the following +OpenVINO™ simplifies the process and ensures high-performance integrations, with the following options: .. tab-set:: - .. tab-item:: OpenVINO GenAI + .. tab-item:: OpenVINO™ GenAI | - Suggested for production deployment for the supported use cases. | - Smaller footprint and fewer dependencies. @@ -39,6 +39,8 @@ options: text generation loop, tokenization, and scheduling, offering ease of use and high performance. + `Check out the OpenVINO GenAI Quick-start Guide [PDF] `__ + .. tab-item:: Hugging Face integration | - Suggested for prototyping and, if the use case is not covered by OpenVINO GenAI, production. @@ -54,49 +56,34 @@ options: as well as conversion on the fly. For integration with the final product it may offer lower performance, though. -`Check out the GenAI Quick-start Guide [PDF] `__ - -The advantages of using OpenVINO for LLM deployment: - -.. dropdown:: Fewer dependencies and smaller footprint - :animate: fade-in-slide-down - :color: secondary - - Less bloated than frameworks such as Hugging Face and PyTorch, with a smaller binary size and reduced - memory footprint, makes deployments easier and updates more manageable. - -.. dropdown:: Compression and precision management - :animate: fade-in-slide-down - :color: secondary - Techniques such as 8-bit and 4-bit weight compression, including embedding layers, and storage - format reduction. This includes fp16 precision for non-compressed models and int8/int4 for - compressed models, like GPTQ models from `Hugging Face `__. -.. dropdown:: Enhanced inference capabilities - :animate: fade-in-slide-down - :color: secondary +The advantages of using OpenVINO for generative model deployment: - Advanced features like in-place KV-cache, dynamic quantization, KV-cache quantization and - encapsulation, dynamic beam size configuration, and speculative sampling, and more are - available. +| **Fewer dependencies and smaller footprint** +| Less bloated than frameworks such as Hugging Face and PyTorch, with a smaller binary size and reduced + memory footprint, makes deployments easier and updates more manageable. -.. dropdown:: Stateful model optimization - :animate: fade-in-slide-down - :color: secondary +| **Compression and precision management** +| Techniques such as 8-bit and 4-bit weight compression, including embedding layers, and storage + format reduction. This includes fp16 precision for non-compressed models and int8/int4 for + compressed models, like GPTQ models from `Hugging Face `__. - Models from the Hugging Face Transformers are converted into a stateful form, optimizing - inference performance and memory usage in long-running text generation tasks by managing past - KV-cache tensors more efficiently internally. This feature is automatically activated for - many supported models, while unsupported ones remain stateless. Learn more about the - :doc:`Stateful models and State API <../openvino-workflow/running-inference/stateful-models>`. +| **Enhanced inference capabilities** +| Advanced features like in-place KV-cache, dynamic quantization, KV-cache quantization and + encapsulation, dynamic beam size configuration, and speculative sampling, and more are + available. -.. dropdown:: Optimized LLM inference - :animate: fade-in-slide-down - :color: secondary +| **Stateful model optimization** +| Models from the Hugging Face Transformers are converted into a stateful form, optimizing + inference performance and memory usage in long-running text generation tasks by managing past + KV-cache tensors more efficiently internally. This feature is automatically activated for + many supported models, while unsupported ones remain stateless. Learn more about the + :doc:`Stateful models and State API <../openvino-workflow/running-inference/stateful-models>`. - Includes a Python API for rapid development and C++ for further optimization, offering - better performance than Python-based runtimes. +| **Optimized LLM inference** +| Includes a Python API for rapid development and C++ for further optimization, offering + better performance than Python-based runtimes. Proceed to guides on: diff --git a/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide.rst b/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide.rst index 172586831252a9..eff30eed054295 100644 --- a/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide.rst +++ b/docs/articles_en/learn-openvino/llm_inference_guide/genai-guide.rst @@ -28,6 +28,10 @@ make sure to :doc:`install OpenVINO with GenAI <../../get-started/install-openvi .. dropdown:: Text-to-Image Generation + OpenVINO GenAI introduces the openvino_genai.Text2ImagePipeline for inference of text-to-image + models such as: as Stable Diffusion 1.5, 2.1, XL, LCM, Flex, and more. + See the following usage example for reference. + .. tab-set:: .. tab-item:: Python @@ -579,8 +583,9 @@ compression is done by NNCF at the model export stage. The exported model contai information necessary for execution, including the tokenizer/detokenizer and the generation config, ensuring that its results match those generated by Hugging Face. -The `LLMPipeline` is the main object used for decoding and handles all the necessary steps. -You can construct it directly from the folder with the converted model. +The `LLMPipeline` is the main object to setup the model for text generation. You can provide the +converted model to this object, specify the device for inference, and provide additional +parameters. .. tab-set:: @@ -911,7 +916,7 @@ running the following code: GenAI API ####################################### -The use case described here uses the following OpenVINO GenAI API methods: +The use case described here uses the following OpenVINO GenAI API classes: * generation_config - defines a configuration class for text generation, enabling customization of the generation process such as the maximum length of @@ -921,7 +926,6 @@ The use case described here uses the following OpenVINO GenAI API methods: text generation, and managing outputs with configurable options. * streamer_base - an abstract base class for creating streamers. * tokenizer - the tokenizer class for text encoding and decoding. -* visibility - controls the visibility of the GenAI library. Learn more from the `GenAI API reference `__. diff --git a/docs/articles_en/learn-openvino/llm_inference_guide/genai-model-preparation.rst b/docs/articles_en/learn-openvino/llm_inference_guide/genai-model-preparation.rst index 53b8d5440ca855..e6d15675ea45b8 100644 --- a/docs/articles_en/learn-openvino/llm_inference_guide/genai-model-preparation.rst +++ b/docs/articles_en/learn-openvino/llm_inference_guide/genai-model-preparation.rst @@ -7,8 +7,8 @@ Generative Model Preparation -Since generative AI models tend to be big and resource-heavy, it is advisable to store them -locally and optimize for efficient inference. This article will show how to prepare +Since generative AI models tend to be big and resource-heavy, it is advisable to +optimize them for efficient inference. This article will show how to prepare LLM models for inference with OpenVINO by: * `Downloading Models from Hugging Face <#download-generative-models-from-hugging-face-hub>`__ diff --git a/docs/sphinx_setup/_static/download/GenAI_Quick_Start_Guide.pdf b/docs/sphinx_setup/_static/download/GenAI_Quick_Start_Guide.pdf index 90ad7bd6b000b4..13edfc8f0b7bc2 100644 Binary files a/docs/sphinx_setup/_static/download/GenAI_Quick_Start_Guide.pdf and b/docs/sphinx_setup/_static/download/GenAI_Quick_Start_Guide.pdf differ