Add manual benchmark workflow and S3 result persistence (#429)

Hi everyone, This pull request introduces a class and a workflow designed to store the results of a benchmark run on an S3 bucket. The key used for storage includes the identifier for the benchmark itself, the branch used, the release version, the current date and the commit hash in that order. Furthermore, boto3 package is added to interact with AWS components. I look forward to your feedback.
emdgroup · Nov 27, 2024 · f4669a2 · f4669a2
2 parents db5513a + d260235
commit f4669a2
Show file tree

Hide file tree

Showing 14 changed files with 471 additions and 12 deletions.
diff --git a/.github/workflows/manual_benchmark.yml b/.github/workflows/manual_benchmark.yml
@@ -0,0 +1,54 @@
+name: Run Benchmark
+
+on:
+  workflow_dispatch:
+
+permissions:
+  contents: read
+  id-token: write
+
+jobs:
+  add-runner:
+    runs-on: ubuntu-latest
+    steps:
+      - name: Generate a token
+        id: generate-token
+        uses: actions/create-github-app-token@v1
+        with:
+          app-id: ${{ vars.APP_ID }}
+          private-key: ${{ secrets.APP_PRIVATE_KEY }}
+      - name: Configure AWS credentials
+        uses: aws-actions/configure-aws-credentials@v4
+        with:
+          role-to-assume: ${{ secrets.AWS_ROLE_TO_ASSUME }}
+          role-session-name: Github_Add_Runner
+          aws-region: eu-central-1
+      - name: Login to Amazon ECR
+        id: login-ecr
+        uses: aws-actions/amazon-ecr-login@v2
+      - name: Execute Lambda function
+        run: |
+          aws lambda invoke --function-name jit_runner_register_and_create_runner_container  --cli-binary-format raw-in-base64-out --payload '{"github_api_secret": "${{ steps.generate-token.outputs.token }}", "count_container":  1, "container_compute": "XL", "repository": "${{ github.repository }}" }'  response.json
+          cat response.json
+          if ! grep -q '"statusCode": 200' response.json; then
+            echo "Lambda function failed. statusCode is not 200."
+            exit 1
+          fi
+
+  benchmark-test:
+    needs: add-runner
+    runs-on: self-hosted
+    env:
+      BAYBE_BENCHMARKING_PERSISTENCE_PATH: ${{ secrets.TEST_RESULT_S3_BUCKET }}
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+      - uses: actions/setup-python@v5
+        id: setup-python
+        with:
+          python-version: "3.10"
+      - name: Benchmark
+        run: |
+          pip install '.[benchmarking]'
+          python -m benchmarks
diff --git a/.lockfiles/py310-dev.lock b/.lockfiles/py310-dev.lock
@@ -47,6 +47,12 @@ blinker==1.8.2
     # via streamlit
 boolean-py==4.0
     # via license-expression
+boto3==1.35.68
+    # via baybe (pyproject.toml)
+botocore==1.35.68
+    # via
+    #   boto3
+    #   s3transfer
 botorch==0.11.3
     # via baybe (pyproject.toml)
 cachecontrol==0.14.0
@@ -264,6 +270,10 @@ jinja2==3.1.4
     #   pydeck
     #   sphinx
     #   torch
+jmespath==1.0.1
+    # via
+    #   boto3
+    #   botocore
 joblib==1.4.2
     # via
     #   baybe (pyproject.toml)
@@ -700,6 +710,7 @@ pytest-cov==5.0.0
 python-dateutil==2.9.0.post0
     # via
     #   arrow
+    #   botocore
     #   jupyter-client
     #   matplotlib
     #   pandas
@@ -768,6 +779,8 @@ rpds-py==0.19.0
     #   referencing
 ruff==0.5.2
     # via baybe (pyproject.toml)
+s3transfer==0.10.4
+    # via boto3
 scikit-fingerprints==1.9.0
     # via baybe (pyproject.toml)
 scikit-learn==1.5.1
@@ -985,7 +998,9 @@ tzdata==2024.1
 uri-template==1.3.0
     # via jsonschema
 urllib3==2.2.2
-    # via requests
+    # via
+    #   botocore
+    #   requests
 uv==0.3.0
     # via
     #   baybe (pyproject.toml)

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -12,6 +12,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - `benchmarks` subpackage for defining and running performance tests
 – `Campaign.toggle_discrete_candidates` to dynamically in-/exclude discrete candidates
 - `DiscreteConstraint.get_valid` to conveniently access valid candidates
+- Functionality for persisting benchmarking results on S3 from a manual pipeline run
 
 ### Changed
 - `SubstanceParameter` encodings are now computed exclusively with the

diff --git a/CONTRIBUTORS.md b/CONTRIBUTORS.md
@@ -29,4 +29,4 @@
 - Karin Hrovatin (Merck KGaA, Darmstadt, Germany):\
   `scikit-fingerprints` support
 - Fabian Liebig (Merck KGaA, Darmstadt, Germany):\
-  Benchmarking structure
+  Benchmarking structure and persistence capabilities for benchmarking results
diff --git a/benchmarks/README.md b/benchmarks/README.md
@@ -0,0 +1,85 @@
+This module contains benchmarks meant to test the performance of BayBE for
+pre-defined tasks. The benchmarks can be executed as a whole by executing
+the following command:
+
+```bash
+python -m benchmarks
+```
+
+# `Benchmark`
+
+The `Benchmark` object is the combination of all benchmark related data.
+At the heart is the callable `function`, used to perform and hide the
+benchmarked code. The `name` serves as the unique identifier of the benchmark. Note that
+this identifier is also used for storing a `Result`. Therefore, any change will be
+considered a new benchmark. The `function`s `__doc__` is used to
+automatically set the `description`. A full code example can be found in the
+`domains/synthetic_2C1D_1C.py` file.
+
+# `BenchmarkSettings`
+
+The `BenchmarkSettings` object is used to parameterize the benchmark `function`.
+It is an abstract base class that can be extended by the user to provide
+additional information. The only required attribute is
+`random_seed`, which is used to seed the entire call of the benchmark `function`.
+Currently, the following settings are available:
+
+## `ConvergenceExperimentSettings`
+
+The `ConvergenceExperimentSettings` object is used to parameterize the
+convergence experiment benchmarks and holds information used for BayBE scenario
+executions. Please refer to the BayBE documentation for more information
+about the [simulations subpackage](baybe.simulation).
+
+# `Result`
+
+The `Result` object encapsulates all execution-relevant information of the `Benchmark`
+and represents the `Result` of the benchmark `function`, along with state information
+at the time of execution.
+
+## `ResultMetadata`
+
+The `ResultMetadata` is the wrapper to hold the described information about the
+`Benchmark` at runtime. A combination of the benchmark identifier and the metadata
+is meant to describe the conducted `Result` uniquely under the assumption that equal
+benchmarked code states are also equally representative due to the fixed random seed.
+
+# Add your benchmark to the benchmarking module
+
+In the last step, your benchmark object has to be added to the
+`benchmarks module`. This is done by adding the object to the `BENCHMARKS`
+list in the `__init__.py` file in the `domains` folder. The `BENCHMARKS` contains all
+objects that should be called when running the `benchmarks module`.
+
+# Persisting Results
+
+`Result`s are stored automatically. Since multiple storage types are provided with
+different requirements and compatibilities, the `PathConstructor` class is used to
+construct the identifier for the file. For example `S3ObjectStorage` is used to
+store the `Result`s in an S3 bucket which separates the key by `/` but does not create
+real folders while the usual local persistence creates a file with a `_` so that folder
+creation is not necessary. The class handling the storage of the resulting object get
+this `PathConstructor` and use it in the way it needs the identifier to be.
+The following types of storage are available:
+
+## `LocalFileObjectStorage`
+
+Stores a file on the local file system and will automatically be chosen when calling
+the `benchmarks module` if it does not run in the CI/CD pipeline. A prefix folder path can be
+provided when creating the object. The file will be stored in the current working
+directory if no prefix is provided. The file will be stored in the following format
+with the prefix:
+`<PREFIX_PATH>/<benchmark_name>_<branch>_<latest_baybe_tag>_<execution-date>_<commit_hash>_result.json`.
+
+## `S3ObjectStorage`
+
+Stores a file in an S3 bucket and will automatically be chosen when calling the
+`benchmarks module` if it runs in the CI/CD pipeline. For locating the S3-Bucket to
+persist, the environment variable `BAYBE_BENCHMARKING_PERSISTENCE_PATH` must be set
+with its name. For running the `benchmarks module` in the CI/CD pipeline,
+there must be also the possibility to assume a AWS role from a job call.
+This is done by providing the roles ARN in the secret `AWS_ROLE_TO_ASSUME`.
+For creating temporary credentials, a GitHub App will be used.
+To generated a token, the id of the GitHub App and its secret key must be provided in
+the secrets `APP_ID` and `APP_PRIVATE_KEY`. The file will be stored in the following
+format: `<benchmark_name>/<branch>/<latest_baybe_tag>/<execution-date>/<commit_hash>/result.json`.
diff --git a/benchmarks/__main__.py b/benchmarks/__main__.py
@@ -1,13 +1,27 @@
 """Executes the benchmarking module."""
 # Run this via 'python -m benchmarks' from the root directory.
 
+import os
+
 from benchmarks.domains import BENCHMARKS
+from benchmarks.persistence import (
+    LocalFileObjectStorage,
+    PathConstructor,
+    S3ObjectStorage,
+)
+
+RUNS_IN_CI = "CI" in os.environ
 
 
 def main():
     """Run all benchmarks."""
     for benchmark in BENCHMARKS:
-        benchmark()
+        result = benchmark()
+        path_constructor = PathConstructor.from_result(result)
+        persist_dict = benchmark.to_dict() | result.to_dict()
+
+        object_storage = S3ObjectStorage() if RUNS_IN_CI else LocalFileObjectStorage()
+        object_storage.write_json(persist_dict, path_constructor)
 
 
 if __name__ == "__main__":

diff --git a/benchmarks/definition/config.py b/benchmarks/definition/config.py
@@ -8,15 +8,16 @@
 
 from attrs import define, field
 from attrs.validators import instance_of
+from cattr.gen import make_dict_unstructure_fn, override
 from pandas import DataFrame
 
-from baybe.serialization.mixin import SerialMixin
 from baybe.utils.random import temporary_seed
 from benchmarks.result import Result, ResultMetadata
+from benchmarks.serialization import BenchmarkSerialization, converter
 
 
 @define(frozen=True)
-class BenchmarkSettings(SerialMixin, ABC):
+class BenchmarkSettings(ABC, BenchmarkSerialization):
     """Benchmark configuration for recommender analyses."""
 
     random_seed: int = field(validator=instance_of(int), kw_only=True, default=1337)
@@ -41,7 +42,7 @@ class ConvergenceExperimentSettings(BenchmarkSettings):
 
 
 @define(frozen=True)
-class Benchmark(Generic[BenchmarkSettingsType]):
+class Benchmark(Generic[BenchmarkSettingsType], BenchmarkSerialization):
     """The base class for a benchmark executable."""
 
     settings: BenchmarkSettingsType = field()
@@ -88,3 +89,15 @@ def __call__(self) -> Result:
         )
 
         return Result(self.name, result, metadata)
+
+
+# Register un-/structure hooks
+converter.register_unstructure_hook(
+    Benchmark,
+    lambda o: dict(
+        {"description": o.description},
+        **make_dict_unstructure_fn(Benchmark, converter, function=override(omit=True))(
+            o
+        ),
+    ),
+)
diff --git a/benchmarks/persistence/__init__.py b/benchmarks/persistence/__init__.py
@@ -0,0 +1,9 @@
+"""Module for persisting benchmarking results."""
+
+from benchmarks.persistence.persistence import (
+    LocalFileObjectStorage,
+    PathConstructor,
+    S3ObjectStorage,
+)
+
+__all__ = ["PathConstructor", "S3ObjectStorage", "LocalFileObjectStorage"]