From 131c01bac8ac80774b219c5e251e75e3a79fa826 Mon Sep 17 00:00:00 2001
From: Karol Blaszczak <karol.blaszczak@intel.com>
Date: Thu, 3 Oct 2024 12:17:30 +0200
Subject: [PATCH] [DOCS] benchmark content restructuring

---
 .../about-openvino/performance-benchmarks.rst |  94 +++-----
 .../getting-performance-numbers.rst           | 208 +++++++++++++-----
 2 files changed, 189 insertions(+), 113 deletions(-)
diff --git a/docs/articles_en/about-openvino/performance-benchmarks.rst b/docs/articles_en/about-openvino/performance-benchmarks.rst
index aa60c44a2ad5c8..0447f12bf27f0a 100644
--- a/docs/articles_en/about-openvino/performance-benchmarks.rst
+++ b/docs/articles_en/about-openvino/performance-benchmarks.rst
@@ -36,7 +36,7 @@ For a more detailed view of performance numbers for generative AI models, check
          :outline:
          :expand:
 
-         :material-regular:`bar_chart;1.4em` OpenVINO Benchmark Graphs
+         :material-regular:`bar_chart;1.4em` OpenVINO Benchmark Graphs (general)
 
    .. grid-item::
 
@@ -46,10 +46,34 @@ For a more detailed view of performance numbers for generative AI models, check
          :outline:
          :expand:
 
-         :material-regular:`bar_chart;1.4em` OVMS Benchmark Graphs
+         :material-regular:`bar_chart;1.4em` OVMS Benchmark Graphs (computer vision)
 
+   .. grid-item::
+
+         .. button-link:: ./generative-ai-performance.html
+            :class: ov-toolkit-benchmark-genai
+            :color: primary
+            :outline:
+            :expand:
+
+            :material-regular:`bar_chart;1.4em` OpenVINO Benchmark Data (GenAI)
+
+   .. grid-item::
+
+      .. button-link::
+         :class: ovms-toolkit-benchmark-llm
+         :color: primary
+         :outline:
+         :expand:
 
-Key performance indicators and workload parameters.
+         :material-regular:`bar_chart;1.4em` OVMS Benchmark Graphs (LLM)
+
+
+
+
+
+
+**Key performance indicators and workload parameters**
 
 .. tab-set::
 
@@ -97,9 +121,7 @@ Key performance indicators and workload parameters.
       * input token length: 1024 (the tokens for GenAI models are in English).
 
 
-.. raw:: html
-
-   <h2>Platforms, Configurations, Methodology</h2>
+**Platforms, Configurations, Methodology**
 
 For a listing of all platforms and configurations used for testing, refer to the following:
 
@@ -130,59 +152,13 @@ For a listing of all platforms and configurations used for testing, refer to the
          :material-regular:`download;1.5em` Click for Performance Data [XLSX]
 
 
-The OpenVINO benchmark setup includes a single system with OpenVINO™, as well as the benchmark
-application installed. It measures the time spent on actual inference (excluding any pre or post
-processing) and then reports on the inferences per second (or Frames Per Second).
-
-OpenVINO™ Model Server (OVMS) employs the Intel® Distribution of OpenVINO™ toolkit runtime
-libraries and exposes a set of models via a convenient inference API over gRPC or HTTP/REST.
-Its benchmark results are measured with the configuration of multiple-clients-single-server,
-using two hardware platforms connected by ethernet. Network bandwidth depends on both platforms
-and models used. It is set not to be a bottleneck for workload intensity. The connection is
-dedicated only to measuring performance.
+To see the methodology used to obtain the numbers and learn how to get your own numbers,
+see the guide on :doc:`getting performance numbers <performance-benchmarks/getting-performance-numbers>`.
 
-.. dropdown:: See more details about OVMS benchmark setup
 
-   The benchmark setup for OVMS consists of four main parts:
 
-   .. image:: ../assets/images/performance_benchmarks_ovms_02.png
-      :alt: OVMS Benchmark Setup Diagram
 
-   * **OpenVINO™ Model Server** is launched as a docker container on the server platform and it
-     listens to (and answers) requests from clients. OpenVINO™ Model Server is run on the same
-     system as the OpenVINO™ toolkit benchmark application in corresponding benchmarking. Models
-     served by OpenVINO™ Model Server are located in a local file system mounted into the docker
-     container. The OpenVINO™ Model Server instance communicates with other components via ports
-     over a dedicated docker network.
-
-   * **Clients** are run in separated physical machine referred to as client platform. Clients
-     are implemented in Python3 programming language based on TensorFlow* API and they work as
-     parallel processes. Each client waits for a response from OpenVINO™ Model Server before it
-     will send a new next request. The role played by the clients is also verification of
-     responses.
-
-   * **Load balancer** works on the client platform in a docker container. HAProxy is used for
-     this purpose. Its main role is counting of requests forwarded from clients to OpenVINO™
-     Model Server, estimating its latency, and sharing this information by Prometheus service.
-     The reason of locating the load balancer on the client site is to simulate real life
-     scenario that includes impact of physical network on reported metrics.
-
-   * **Execution Controller** is launched on the client platform. It is responsible for
-     synchronization of the whole measurement process, downloading metrics from the load
-     balancer, and presenting the final report of the execution.
-
-
-
-.. raw:: html
-
-   <h2>Test performance yourself</h2>
-
-You can also test performance for your system yourself, following the guide on
-:doc:`getting performance numbers <performance-benchmarks/getting-performance-numbers>`.
-
-.. raw:: html
-
-   <h2>Disclaimers</h2>
+**Disclaimers**
 
 * Intel® Distribution of OpenVINO™ toolkit performance results are based on release
   2024.3, as of July 31, 2024.
@@ -192,22 +168,16 @@ You can also test performance for your system yourself, following the guide on
 
 The results may not reflect all publicly available updates. Intel technologies' features and
 benefits depend on system configuration and may require enabled hardware, software, or service
-activation. Learn more at intel.com, or from the OEM or retailer.
+activation. Learn more at intel.com, the OEM, or retailer.
 
 See configuration disclosure for details. No product can be absolutely secure.
 Performance varies by use, configuration and other factors. Learn more at
 `www.intel.com/PerformanceIndex <https://www.intel.com/PerformanceIndex>`__.
-Your costs and results may vary.
 Intel optimizations, for Intel compilers or other products, may not optimize to the same degree
 for non-Intel products.
 
 
 
-
-
-
-
-
 .. raw:: html
 
    <link rel="stylesheet" type="text/css" href="../_static/css/benchmark-banner.css">
diff --git a/docs/articles_en/about-openvino/performance-benchmarks/getting-performance-numbers.rst b/docs/articles_en/about-openvino/performance-benchmarks/getting-performance-numbers.rst
index 069c940063cf14..f905cca65b5995 100644
--- a/docs/articles_en/about-openvino/performance-benchmarks/getting-performance-numbers.rst
+++ b/docs/articles_en/about-openvino/performance-benchmarks/getting-performance-numbers.rst
@@ -1,25 +1,161 @@
 Getting Performance Numbers
 ===========================
 
+1. `Benchmarking methodology for OpenVINO <#benchmarking-methodology-for-openvino>`__
+
+   a. `OpenVINO benchmarking (general) <openvino-benchmarking--general->`
+   b. `OpenVINO Model Server benchmarking (general) <openvino-model-server-benchmarking--general->`
+   c. `OpenVINO Model Server benchmarking (LLM) <openvino-model-server-benchmarking--llm->`
+
+2. `How to obtain benchmark results <#how-to-obtain-benchmark-results>`__
+
+   a.
+   b.
+   c.
+
+
+Benchmarking methodology for OpenVINO
+###############################################################################################
+
+OpenVINO benchmarking (general)
++++++++++++++++++++++++++++++++++++++
+
+The OpenVINO benchmark setup includes a single system with OpenVINO™, as well as the benchmark
+application installed. It measures the time spent on actual inference (excluding any pre or post
+processing) and then reports on the inferences per second (or Frames Per Second).
+
+
+OpenVINO Model Server benchmarking (general)
+++++++++++++++++++++++++++++++++++++++++++++
+
+OpenVINO™ Model Server (OVMS) employs the Intel® Distribution of OpenVINO™ toolkit runtime
+libraries and exposes a set of models via a convenient inference API over gRPC or HTTP/REST.
+Its benchmark results are measured with the configuration of multiple-clients-single-server,
+using two hardware platforms connected by ethernet. Network bandwidth depends on both platforms
+and models used. It is set not to be a bottleneck for workload intensity. The connection is
+dedicated only to measuring performance.
+
+.. dropdown:: See more details about OVMS benchmark setup
+
+   The benchmark setup for OVMS consists of four main parts:
+
+   .. image:: ../assets/images/performance_benchmarks_ovms_02.png
+      :alt: OVMS Benchmark Setup Diagram
+
+   * **OpenVINO™ Model Server** is launched as a docker container on the server platform and it
+     listens to (and answers) requests from clients. OpenVINO™ Model Server is run on the same
+     system as the OpenVINO™ toolkit benchmark application in corresponding benchmarking. Models
+     served by OpenVINO™ Model Server are located in a local file system mounted into the docker
+     container. The OpenVINO™ Model Server instance communicates with other components via ports
+     over a dedicated docker network.
+
+   * **Clients** are run in separated physical machine referred to as client platform. Clients
+     are implemented in Python3 programming language based on TensorFlow* API and they work as
+     parallel processes. Each client waits for a response from OpenVINO™ Model Server before it
+     will send a new next request. The role played by the clients is also verification of
+     responses.
+
+   * **Load balancer** works on the client platform in a docker container. HAProxy is used for
+     this purpose. Its main role is counting of requests forwarded from clients to OpenVINO™
+     Model Server, estimating its latency, and sharing this information by Prometheus service.
+     The reason of locating the load balancer on the client site is to simulate real life
+     scenario that includes impact of physical network on reported metrics.
+
+   * **Execution Controller** is launched on the client platform. It is responsible for
+     synchronization of the whole measurement process, downloading metrics from the load
+     balancer, and presenting the final report of the execution.
+
+
+OpenVINO Model Server benchmarking (LLM)
+++++++++++++++++++++++++++++++++++++++++
+
+Large Language Models require a different benchmarking approach to static models.
+
+
+
+
+
+
+How to obtain benchmark results
+###############################################################################################
+
+General guidance
++++++++++++++++++
+
+.. dropdown:: Select a proper set of operations to measure
+
+   When evaluating performance of a model with OpenVINO Runtime, it is required to measure a
+   proper set of operations.
+
+   * Avoid including one-time costs such as model loading.
+   * Track operations that occur outside OpenVINO Runtime, such as video decoding, separately.
+
+   .. note::
+
+      Some image pre-processing can be baked into OpenVINO IR and accelerated accordingly.
+      For more information, refer to
+      :doc:`Embedding Pre-processing <../../documentation/legacy-features/transition-legacy-conversion-api/legacy-conversion-api/[legacy]-embedding-preprocessing-computation>`
+      and
+      :doc:`General Runtime Optimizations <../../openvino-workflow/running-inference/optimize-inference/general-optimizations>`.
+
+.. dropdown:: Maximize the chance to obtain credible data
+
+   Performance conclusions should be build on reproducible data. As for the performance
+   measurements, they should be done with a large number of invocations of the same routine.
+   Since the first iteration is almost always significantly slower than the subsequent ones,
+   an aggregated value can be used for the execution time for final projections:
+
+   * If the warm-up run does not help or execution times still vary, you can try running a
+     large number of iterations and then use the mean value of the results.
+   * If time values differ too much, consider using a geomean.
+   * Be aware of potential power-related irregularities, such as throttling. A device may assume
+     one of several different power states, so it is advisable to fix its frequency when
+     optimizing, for better performance data reproducibility.
+   * Note that end-to-end application benchmarking should also be performed under real
+     operational conditions.
+
+.. dropdown:: Compare performance with native/framework code
+
+   When comparing OpenVINO Runtime performance with the framework or reference code,
+   make sure that both versions are as similar as possible:
+
+   * Wrap the exact inference execution (for examples, see :doc:`Benchmark app <../../learn-openvino/openvino-samples/benchmark-tool>`).
+   * Do not include model loading time.
+   * Ensure that the inputs are identical for OpenVINO Runtime and the framework. For example, watch out for random values that can be used to populate the inputs.
+   * In situations when any user-side pre-processing should be tracked separately, consider :doc:`image pre-processing and conversion <../../openvino-workflow/running-inference/optimize-inference/optimize-preprocessing>`.
+   * When applicable, leverage the :doc:`Dynamic Shapes support <../../openvino-workflow/running-inference/dynamic-shapes>`.
+   * If possible, demand the same accuracy. For example, TensorFlow allows ``FP16`` execution, so when comparing to that, make sure to test the OpenVINO Runtime with the ``FP16`` as well.
 
 
-This guide explains how to use the benchmark_app to get performance numbers. It also explains how the performance
-numbers are reflected through internal inference performance counters and execution graphs. It also includes
-information on using ITT and Intel® VTune™ Profiler to get performance insights.
 
 
-.. raw:: html
 
-   <h2>Test performance with the benchmark_app</h2>
 
 
 
-You can run OpenVINO benchmarks in both C++ and Python APIs, yet the experience differs in each case.
-The Python one is part of OpenVINO Runtime installation, while C++ is available as a code sample.
-For a detailed description, see: :doc:`benchmark_app <../../learn-openvino/openvino-samples/benchmark-tool>`.
 
-Make sure to install the latest release package with support for frameworks of the models you want to test.
-For the most reliable performance benchmarks, :doc:`prepare the model for use with OpenVINO <../../openvino-workflow/model-preparation>`.
+
+
+
+
+
+
+OpenVINO benchmarking (general)
++++++++++++++++++++++++++++++++
+
+The default way of measuring OpenVINO performance is running a piece of code, referred to as
+:doc:`the benchmark tool <../../learn-openvino/openvino-samples/benchmark-tool>`.
+For Python, it is part of OpenVINO Runtime installation, while for C++, it is available as a
+code sample.
+
+
+Make sure to install the latest release package with support for frameworks of the models you
+want to test. For the most reliable performance benchmarks,
+:doc:`prepare the model for use with OpenVINO <../../openvino-workflow/model-preparation>`.
+
+
+
+
 
 
 
@@ -50,60 +186,20 @@ it is recommended to always start performance evaluation with the :doc:`OpenVINO
 
 
 
-.. raw:: html
-
-   <h2>Additional benchmarking considerations</h2>
-
-
-
-.. raw:: html
 
-   <h3>1 - Select a Proper Set of Operations to Measure</h3>
 
 
-When evaluating performance of a model with OpenVINO Runtime, it is required to measure a proper set of operations.
 
-- Avoid including one-time costs such as model loading.
-- Track operations that occur outside OpenVINO Runtime (such as video decoding) separately.
 
 
-.. note::
 
-   Some image pre-processing can be baked into OpenVINO IR and accelerated accordingly. For more information,
-   refer to :doc:`Embedding Pre-processing <../../documentation/legacy-features/transition-legacy-conversion-api/legacy-conversion-api/[legacy]-embedding-preprocessing-computation>` and
-   :doc:`General Runtime Optimizations <../../openvino-workflow/running-inference/optimize-inference/general-optimizations>`.
 
 
 
 .. raw:: html
 
-   <h3>2 - Try to Get Credible Data</h3>
-
-Performance conclusions should be build upon reproducible data. As for the performance measurements, they should
-be done with a large number of invocations of the same routine. Since the first iteration is almost always significantly
-slower than the subsequent ones, an aggregated value can be used for the execution time for final projections:
-
-- If the warm-up run does not help or execution time still varies, you can try running a large number of iterations
-  and then average or find a mean of the results.
-- If the time values range too much, consider geomean.
-- Be aware of the throttling and other power oddities. A device can exist in one of several different power states.
-  When optimizing your model, consider fixing the device frequency for better performance data reproducibility.
-  However, the end-to-end (application) benchmarking should also be performed under real operational conditions.
-
-
-
-.. raw:: html
-
-   <h3>3 - Compare Performance with Native/Framework Code</h3>
-
-When comparing the OpenVINO Runtime performance with the framework or another reference code, make sure that both versions are as similar as possible:
+   <h2>Additional benchmarking considerations</h2>
 
--	Wrap the exact inference execution (for examples, see :doc:`Benchmark app <../../learn-openvino/openvino-samples/benchmark-tool>`).
--	Do not include model loading time.
--	Ensure that the inputs are identical for OpenVINO Runtime and the framework. For example, watch out for random values that can be used to populate the inputs.
--	In situations when any user-side pre-processing should be tracked separately, consider :doc:`image pre-processing and conversion <../../openvino-workflow/running-inference/optimize-inference/optimize-preprocessing>`.
--  When applicable, leverage the :doc:`Dynamic Shapes support <../../openvino-workflow/running-inference/dynamic-shapes>`.
--	If possible, demand the same accuracy. For example, TensorFlow allows ``FP16`` execution, so when comparing to that, make sure to test the OpenVINO Runtime with the ``FP16`` as well.
 
 
 .. raw:: html
@@ -173,6 +269,16 @@ insights in the application-level performance on the timeline view.
 
 
 
+
+
+
+
+
+
+
+
+
+
 .. raw:: html
 
    <link rel="stylesheet" type="text/css" href="../../_static/css/benchmark-banner.css">