[Inference over the network] Loadgen PR (mlcommons#1135)

* Add over the network specifications to README * Update and remove unexistent links * Fix issues in README * Remove normative language * Change title of network section Co-authored-by: rameshchukka <[email protected]>
ctuning · May 10, 2022 · f8500d6 · f8500d6
1 parent a8a43dc
commit f8500d6
Show file tree

Hide file tree

Showing 3 changed files with 69 additions and 9 deletions.
diff --git a/loadgen/README.md b/loadgen/README.md
@@ -1,14 +1,11 @@
 # Overview {#mainpage}
 
-*Note:* A compiled html version of this document is hosted online
-[here](https://mlperf.github.io/inference/loadgen/index.html).
-
 ## Introduction
 
 * The LoadGen is a *reusable* module that *efficiently* and *fairly* measures
   the performance of inference systems.
 * It generates traffic for scenarios as formulated by a diverse set of experts
-  in the [MLPerf working group](https://mlperf.org/about).
+  in the [MLCommons working group](https://mlcommons.org/).
 * The scenarios emulate the workloads seen in mobile devices,
   autonomous vehicles, robotics, and cloud-based setups.
 * Although the LoadGen is not model or dataset aware, its strength is in its
@@ -31,16 +28,15 @@ implemented.
 </div>
 
 ## Useful Links
-* [FAQ](@ref ReadmeFAQ)
-* [LoadGen Build Instructions](@ref ReadmeBuild)
-* [LoadGen API](@ref LoadgenAPI)
-* [Test Settings](@ref LoadgenAPITestSettings) -
+* [FAQ](README_FAQ.md)
+* [LoadGen Build Instructions](README_BUILD.md)
+* [LoadGen API](loadgen.h)
+* [Test Settings](test_settings.h) -
   A good description of available scenarios, modes, and knobs.
 * [MLPerf Inference Code](https://github.com/mlcommons/inference) -
   Includes source for the LoadGen and reference models that use the LoadGen.
 * [MLPerf Inference Rules](https://github.com/mlcommons/inference_policies) -
   Any mismatch with this is a bug in the LoadGen.
-* [MLPerf Website](www.mlperf.org)
 
 ## Scope of the LoadGen's Responsibilities
 
@@ -103,3 +99,67 @@ with the reference models.
 
 For templates of how to do the above in detail, refer to code for the demos,
 tests, and reference models.
+
+
+## LoadGen over the Network
+
+For reference, on a high level a submission looks like this:
+
+<div align="center" style="display:flex; flex-flow:row wrap; justify-content: space-evenly;">
+<img src="diagram_submission.png" width="300px" style="padding: 20px">
+</div>
+
+The LoadGen implementation is common to all submissions, while the QSL (“Query Sample Library”) and SUT (“System Under Test”) are implemented by submitters. QSL is responsible for loading the data and includes untimed preprocessing.
+
+A submission over the network introduces a new component “QDL” (query dispatch library) that is added to the system as presented in the following diagram:
+
+<div align="center" style="display:flex; flex-flow:row wrap; justify-content: space-evenly;">
+<img src="diagram_network_submission.png" width="300px" style="padding: 20px">
+</div>
+
+QDL is a proxy for a load-balancer, that dispatches queries to SUT over a physical network, receives the responses and passes them back to LoadGen.  It is implemented by the submitter. The interface of the QDL is the same as the API to SUT. 
+
+In scenarios using QDL, data may be compressed in QSL at the choice of the submitter in order to reduce network transmission time. Decompression is part of the timed processing in SUT. A set of approved standard compression schemes will be specified for each benchmark; additional compression schemes must be approved in advance by the Working Group.
+
+All communication between LoadGen/QSL and SUT is via QDL, and all communication between QDL and SUT must pass over a physical network.
+
+QDL implements the protocol to transmit queries over the network and receive responses. It also implements decompression of any response returned by the SUT, where compression of responses is allowed. Performing any part of the timed preprocessing or inference in QDL is specifically disallowed. Currently no batching is allowed in QDL, although this may be revisited in future.
+
+The MLperf over the Network will run in Server mode and Offline mode. All LoadGen modes are expected to work as is with insignificant changes. These include running the test in performance mode, accuracy mode, find peak performance mode and compliance mode. The same applies for power measurements.
+
+### QDL details
+The Query Dispatch Library is implemented by the submitter and interfaces with LoadGen using the same SUT API. All MLPerf Inference SUTs implement the `mlperf::SystemUnderTest` class which is defined in system_under_test.h. The QDL follows the same API, support all existing `mlperf::SystemUnderTest` methods and use the same header file.
+
+#### QDL Query issue and response over the network
+
+The QDL gets the queries from the LoadGen through 
+```CPP
+void IssueQuery(const std::vector<QuerySample>& samples)
+```
+
+The QDL dispatches the queries to the SUT over the physical media. The exact method and implementation for it are submitter specific and would not be specified at MLCommons. Submitter implementation includes all methods required to serialize the query, load balance, drive it to the Operating system and network interface card and send to the SUT.
+
+The QDL receives the query responses over the network from the SUT. The exact method and implementation for it are submitter specific and would not be specified at MLCommons. The submitter implementation includes all methods required to receive the network data from the Network Interface card, go through the Operating system, deserialize the query response, and provide it back to the LoadGen through query completion by:
+
+```CPP
+struct QuerySampleResponse {
+  ResponseId id;
+  uintptr_t data;
+  size_t size;
+};
+void QuerySamplesComplete(QuerySampleResponse* responses, 
+                          size_t response_count);
+
+```
+
+#### QDL Additional Methods
+
+In addition to that the QDL needs to implement the following methods that are provided by the SUT interface to the LoadGen:
+```CPP
+const std::string& Name() const;
+```
+The `Name` function returns a known string for over the Network SUTs to identify it as over the network benchmark.
+```CPP
+void FlushQueries();
+```
+It is not specified here how the QDL would query and configure the SUT to execute the above methods. The QDL provides the response to the loadgen after the SUT is ready.
diff --git a/loadgen/diagram_network_submission.png b/loadgen/diagram_network_submission.png
diff --git a/loadgen/diagram_submission.png b/loadgen/diagram_submission.png