Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/master' into shape-infer/remove-…
Browse files Browse the repository at this point in the history
…cpu-custom-shape-infer-factories
  • Loading branch information
praasz committed Jan 17, 2025
2 parents b7a89aa + 0848f86 commit 646e42b
Show file tree
Hide file tree
Showing 55 changed files with 881 additions and 721 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,7 +100,7 @@ OpenVINO supports the CPU, GPU, and NPU [devices](https://docs.openvino.ai/2024/

## Generative AI with OpenVINO

Get started with the OpenVINO GenAI [installation](https://docs.openvino.ai/2024/get-started/install-openvino/install-openvino-genai.html) and refer to the [detailed guide](https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/genai-guide.html) to explore the capabilities of Generative AI using OpenVINO.
Get started with the OpenVINO GenAI [installation](https://docs.openvino.ai/2024/get-started/install-openvino/install-openvino-genai.html) and refer to the [detailed guide](https://docs.openvino.ai/2024/openvino-workflow-generative/generative-inference.html) to explore the capabilities of Generative AI using OpenVINO.

Learn how to run LLMs and GenAI with [Samples](https://github.com/openvinotoolkit/openvino.genai/tree/master/samples) in the [OpenVINO™ GenAI repo](https://github.com/openvinotoolkit/openvino.genai). See GenAI in action with Jupyter notebooks: [LLM-powered Chatbot](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/llm-chatbot/README.md) and [LLM Instruction-following pipeline](https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/llm-question-answering/README.md).

Expand Down
2 changes: 1 addition & 1 deletion docs/articles_en/about-openvino/key-features.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Easy Integration
:doc:`torch.compile <../openvino-workflow/torch-compile>` to improve model inference. Apply
OpenVINO optimizations to your PyTorch models directly with a single line of code.
| :doc:`GenAI Out Of The Box <../learn-openvino/llm_inference_guide/genai-guide>`
| :doc:`GenAI Out Of The Box <../openvino-workflow-generative/inference-with-genai>`
| With the genAI flavor of OpenVINO, you can run generative AI with just a couple lines of code.
Check out the GenAI guide for instructions on how to do it.
Expand Down
4 changes: 2 additions & 2 deletions docs/articles_en/documentation/openvino-ecosystem.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ you an overview of a whole ecosystem of tools and solutions under the OpenVINO u

| **GenAI**
| :bdg-link-dark:`Github <https://github.com/openvinotoolkit/openvino.genai>`
:bdg-link-success:`User Guide <https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/genai-guide.html>`
:bdg-link-success:`User Guide <https://docs.openvino.ai/2024/openvino-workflow-generative/inference-with-genai.html>`
OpenVINO™ GenAI Library aims to simplify running inference of generative AI
models. Check the LLM-powered Chatbot Jupyter notebook to see how GenAI works.
Expand Down Expand Up @@ -113,7 +113,7 @@ generative AI and vision models directly on your computer or edge device using O

| **Tokenizers**
| :bdg-link-dark:`Github <https://github.com/openvinotoolkit/openvino_tokenizers>`
:bdg-link-success:`User Guide <https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/ov-tokenizers.html>`
:bdg-link-success:`User Guide <https://docs.openvino.ai/2024/openvino-workflow-generative/ov-tokenizers.html>`
OpenVINO Tokenizers add text processing operations to OpenVINO.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -27,5 +27,5 @@ Additional Resources
* :doc:`OpenVINO GenAI Installation Guide <../install-openvino/install-openvino-genai>`
* `OpenVINO GenAI repository <https://github.com/openvinotoolkit/openvino.genai>`__
* :doc:`OpenVINO Installation Guide <../install-openvino>`
* :doc:`OpenVINO Tokenizers <../../learn-openvino/llm_inference_guide/ov-tokenizers>`
* :doc:`OpenVINO Tokenizers <../../openvino-workflow-generative/ov-tokenizers>`

4 changes: 2 additions & 2 deletions docs/articles_en/get-started/install-openvino.rst
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ All currently supported versions are:
A new OpenVINO GenAI Flavor streamlines application development by providing
LLM-specific interfaces for easy integration of language models, handling tokenization and
text generation. For installation and usage instructions, proceed to
:doc:`Install OpenVINO GenAI Flavor <../learn-openvino/llm_inference_guide/genai-guide>` and
:doc:`Run LLMs with OpenVINO GenAI Flavor <../learn-openvino/llm_inference_guide/genai-guide>`.
:doc:`Install OpenVINO GenAI Flavor <../openvino-workflow-generative>` and
:doc:`Run LLMs with OpenVINO GenAI Flavor <../openvino-workflow-generative/inference-with-genai>`.

.. dropdown:: Building OpenVINO from Source

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ OpenVINO GenAI is a new flavor of OpenVINO, aiming to simplify running inference
It hides the complexity of the generation process and minimizes the amount of code required.
You can now provide a model and input context directly to OpenVINO, which performs tokenization of the
input text, executes the generation loop on the selected device, and returns the generated text.
For a quickstart guide, refer to the :doc:`GenAI API Guide <../../learn-openvino/llm_inference_guide/genai-guide>`.
For a quickstart guide, refer to the :doc:`GenAI API Guide <../../openvino-workflow-generative/inference-with-genai>`.

To see GenAI in action, check the Jupyter notebooks:
`LLM-powered Chatbot <https://github.com/openvinotoolkit/openvino_notebooks/blob/latest/notebooks/llm-chatbot/README.md>`__ and
Expand All @@ -28,7 +28,7 @@ but use the *openvino-genai* package instead of *openvino*:
Archive Installation
###############################

The OpenVINO GenAI archive package includes the OpenVINO™ Runtime and :doc:`Tokenizers <../../learn-openvino/llm_inference_guide/ov-tokenizers>`.
The OpenVINO GenAI archive package includes the OpenVINO™ Runtime and :doc:`Tokenizers <../../openvino-workflow-generative/ov-tokenizers>`.
To install the GenAI flavor of OpenVINO from an archive file, follow the standard installation steps for your system
but instead of using the vanilla package file, download the one with OpenVINO GenAI:

Expand Down
4 changes: 0 additions & 4 deletions docs/articles_en/learn-openvino.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@ Learn OpenVINO

Interactive Tutorials (Python) <learn-openvino/interactive-tutorials-python>
Sample Applications (Python & C++) <learn-openvino/openvino-samples>
Generative AI workflow <learn-openvino/llm_inference_guide>



Expand All @@ -28,6 +27,3 @@ as well as an experienced user.
| :doc:`OpenVINO Samples <learn-openvino/openvino-samples>`
| The OpenVINO samples (Python and C++) are simple console applications that show how to use specific OpenVINO API features. They can assist you in executing tasks such as loading a model, running inference, querying particular device capabilities, etc.
| :doc:`Generative AI workflow <learn-openvino/llm_inference_guide>`
| Detailed information on how OpenVINO accelerates Generative AI use cases and what models it supports. This tutorial provides instructions for running Generative AI models using Hugging Face Optimum Intel and Native OpenVINO APIs.
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@ Generative AI workflow
:maxdepth: 1
:hidden:

Generative Model Preparation <llm_inference_guide/genai-model-preparation>
Inference with OpenVINO GenAI <llm_inference_guide/genai-guide>
Inference with Optimum Intel <llm_inference_guide/llm-inference-hf>
OpenVINO Tokenizers <llm_inference_guide/ov-tokenizers>
Generative Model Preparation <openvino-workflow-generative/genai-model-preparation>
Inference with OpenVINO GenAI <openvino-workflow-generative/inference-with-genai>
Inference with Optimum Intel <openvino-workflow-generative/inference-with-optimum-intel>
OpenVINO Tokenizers <openvino-workflow-generative/ov-tokenizers>



Expand Down Expand Up @@ -58,7 +58,7 @@ options:
Note that the base version of OpenVINO may also be used to run generative AI. Although it may
offer a simpler environment, with fewer dependencies, it has significant limitations and a more
demanding implementation process. For reference, see
`the article on generative AI usage of OpenVINO 2024.6 <https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/llm-inference-native-ov.html>`__.
`the article on generative AI usage of OpenVINO 2024.6 <https://docs.openvino.ai/2024/openvino-workflow-generative/llm-inference-native-ov.html>`__.

The advantages of using OpenVINO for generative model deployment:

Expand Down Expand Up @@ -90,8 +90,8 @@ The advantages of using OpenVINO for generative model deployment:

Proceed to guides on:

* :doc:`OpenVINO GenAI Flavor <./llm_inference_guide/genai-guide>`
* :doc:`Hugging Face and Optimum Intel <./llm_inference_guide/llm-inference-hf>`
* `Generative AI with Base OpenVINO <https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/llm-inference-native-ov.html>`__
* :doc:`OpenVINO GenAI Flavor <./openvino-workflow-generative/inference-with-genai>`
* :doc:`Hugging Face and Optimum Intel <./openvino-workflow-generative/inference-with-optimum-intel>`
* `Generative AI with Base OpenVINO <https://docs.openvino.ai/2024/openvino-workflow-generative/llm-inference-native-ov.html>`__


Original file line number Diff line number Diff line change
Expand Up @@ -8,15 +8,15 @@ Inference with OpenVINO GenAI
:maxdepth: 1
:hidden:

NPU inference of LLMs <genai-guide-npu>
NPU inference of LLMs <inference-with-genai-on-npu>


OpenVINO™ GenAI is a library of pipelines and methods, extending the OpenVINO runtime to work
with generative AI models more efficiently. This article provides reference code and guidance
on its usage. Note that the base OpenVINO version will not work with these instructions,
make sure to :doc:`install OpenVINO with GenAI <../../get-started/install-openvino/install-openvino-genai>`.

.. image:: ../../assets/images/genai_main_diagram.svg
.. image:: ../assets/images/genai_main_diagram.svg
:align: center
:alt: OpenVINO GenAI workflow diagram

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ generation with LLMs. Tokenizers convert the input text into a sequence of token
corresponding IDs, so that the model can understand and process it during inference. The
transformation of a sequence of numbers into a string is called detokenization.

.. image:: ../../assets/images/tokenization.svg
.. image:: ../assets/images/tokenization.svg
:align: center

There are two important points in the tokenizer-model relation:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -105,7 +105,7 @@ By default, weights are compressed asymmetrically to "INT8_ASYM" mode.
print(results)
For more details, refer to the article on how to
:doc:`infer LLMs using Optimum Intel <../../learn-openvino/llm_inference_guide/llm-inference-hf>`.
:doc:`infer LLMs using Optimum Intel <../../openvino-workflow-generative/inference-with-optimum-intel>`.

.. tab-item:: Compression with NNCF
:sync: nncf
Expand Down Expand Up @@ -221,7 +221,7 @@ depending on the model.
For more details, refer to the article on how to
:doc:`infer LLMs using Optimum Intel <../../../learn-openvino/llm_inference_guide/llm-inference-hf>`.
:doc:`infer LLMs using Optimum Intel <../../../openvino-workflow-generative/inference-with-optimum-intel>`.

The code snippet below shows how to do 4-bit quantization of the model weights represented
in OpenVINO IR using NNCF:
Expand Down Expand Up @@ -344,7 +344,7 @@ load the compressed model later for faster time to first inference.
.. tip::

Models optimized with with NNCF or Optimum Intel can be used with
:doc:`OpenVINO GenAI <../../learn-openvino/llm_inference_guide/genai-guide>`.
:doc:`OpenVINO GenAI <../../openvino-workflow-generative/inference-with-genai>`.


Auto-tuning of Weight Compression Parameters
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ from the application code to OpenVINO and all related internal work is hidden fr

There are three methods of turning an OpenVINO model into a stateful one:

* :doc:`Optimum-Intel <../../learn-openvino/llm_inference_guide/llm-inference-hf>` - the most user-friendly option. All necessary optimizations
* :doc:`Optimum-Intel <../../openvino-workflow-generative/inference-with-optimum-intel>` - the most user-friendly option. All necessary optimizations
are recognized and applied automatically. The drawback is, the tool does not work with all
models.

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ and you have three ways to do it:

* `Optimum-Intel <https://github.com/huggingface/optimum-intel>`__ - an automated solution
applicable to a selection of models (not covered by this article, for a usage guide
refer to the :doc:`LLM Inference with Hugging Face and Optimum Intel <../../../learn-openvino/llm_inference_guide>` article).
refer to the :doc:`LLM Inference with Hugging Face and Optimum Intel <../../../openvino-workflow-generative>` article).
* :ref:`MakeStateful transformation <ov_ug_make_stateful>` - to choose which pairs of
Parameter and Result to replace.
* :ref:`LowLatency2 transformation <ov_ug_low_latency>` - to detect and replace Parameter
Expand Down
2 changes: 1 addition & 1 deletion docs/notebooks/llm-agent-functioncall-qwen-with-output.rst
Original file line number Diff line number Diff line change
Expand Up @@ -258,7 +258,7 @@ pipeline.
You can get additional inference speed improvement with `Dynamic
Quantization of activations and KV-cache quantization on
CPU <https://docs.openvino.ai/2024/learn-openvino/llm_inference_guide/llm-inference-hf.html#enabling-openvino-runtime-optimizations>`__.
CPU <https://docs.openvino.ai/2024/openvino-workflow-generative/inference-with-optimum-intel.html#enabling-openvino-runtime-optimizations>`__.
These options can be enabled with ``ov_config`` as follows:

.. code:: ipython3
Expand Down
Loading

0 comments on commit 646e42b

Please sign in to comment.