Skip to content

Commit

Permalink
Refactor project structure to support additional modules (#333)
Browse files Browse the repository at this point in the history
For example, adapters for other benchmarking tools.
  • Loading branch information
jcamachor authored Sep 19, 2024
1 parent 896ee2a commit f2752fd
Show file tree
Hide file tree
Showing 1,119 changed files with 854 additions and 690 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/maven.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -90,7 +90,7 @@ jobs:
id: yaml-data
uses: jbutcher5/[email protected]
with:
file: './src/test/resources/config/spark/experiment_config-${{ matrix.lst }}.yaml'
file: './core/src/test/resources/config/spark/experiment_config-${{ matrix.lst }}.yaml'
key-path: '["parameter_values", "external_data_path"]'
- name: Write properties to environment
run: echo "external_data_path=${{ steps.yaml-data.outputs.data }}" >> $GITHUB_ENV
Expand Down
8 changes: 4 additions & 4 deletions .github/workflows/webapp-deploy.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,8 @@ name: Build and deploy Web App - lst-bench
on:
push:
paths:
- metrics/**
- run/**
- core/metrics/**
- core/run/**
branches:
- main
workflow_dispatch:
Expand All @@ -32,7 +32,7 @@ permissions:

env:
AZURE_WEBAPP_NAME: lst-bench
WORKING_DIRECTORY: './metrics/app'
WORKING_DIRECTORY: './core/metrics/app'
STARTUP_COMMAND: 'python -m streamlit run main.py --server.port 8000 --server.address 0.0.0.0 --client.toolbarMode minimal'

jobs:
Expand Down Expand Up @@ -61,7 +61,7 @@ jobs:
- name: 'Copy .duckdb files from ./run/'
run: |
find ./run -type f -name "*.duckdb" -exec cp {} ${{ env.WORKING_DIRECTORY }} \;
find ./core/run -type f -name "*.duckdb" -exec cp {} ${{ env.WORKING_DIRECTORY }} \;
- name: Zip artifact for deployment
working-directory: ${{ env.WORKING_DIRECTORY }}
Expand Down
5 changes: 1 addition & 4 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -65,9 +65,6 @@ local.properties
.DS_Store
/target
/*/target
/example/*/target
/build
/*/build
/example/*/build
/adapters/*/target
/buildSrc/build
/buildSrc/subprojects/*/build
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "adapters/cab-converter/cab"]
path = adapters/cab-converter/cab
url = https://github.com/alexandervanrenen/cab
File renamed without changes.
5 changes: 3 additions & 2 deletions .mvn/wrapper/maven-wrapper.properties
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,6 @@
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.
distributionUrl=https://repo.maven.apache.org/maven2/org/apache/maven/apache-maven/3.9.2/apache-maven-3.9.2-bin.zip
wrapperUrl=https://repo.maven.apache.org/maven2/org/apache/maven/wrapper/maven-wrapper/3.2.0/maven-wrapper-3.2.0.jar
wrapperVersion=3.3.2
distributionType=only-script
distributionUrl=https://repo.maven.apache.org/maven2/org/apache/maven/apache-maven/3.9.9/apache-maven-3.9.9-bin.zip
13 changes: 8 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,7 @@ usage: ./launcher.sh -c <arg> -e <arg> -l <arg> -t <arg> -w <arg>
## Configuration Files
The configuration files used in LST-Bench are YAML files.

You can find their schema, which describes the expected structure and properties, [here](src/main/resources/schemas).
You can find their schema, which describes the expected structure and properties, [here](core/src/main/resources/schemas).

NOTE: The spark schemas are configured for Spark 3.3 or earlier. In case you plan to use Spark 3.4, the setup and setup_data_maintenance tasks need to be
modified to handle [SPARK-44025](https://issues.apache.org/jira/browse/SPARK-44025). Columns in CSV tables need to defined as `STRING` instead of `VARCHAR` or `CHAR`.
Expand All @@ -91,7 +91,7 @@ Append the following regex replacement to the setup and setup_data_maintenance p
replacement: 'string'
```

Additionally, you can find sample configurations that can serve as guidelines for creating your configurations [here](src/main/resources/config).
Additionally, you can find sample configurations that can serve as guidelines for creating your configurations [here](core/src/main/resources/config).
The YAML file can also contain references to environment variable along with default values. The parser will handle the same appropriately.
Example:
```bash
Expand All @@ -100,16 +100,19 @@ Example:

## Architecture

The LST-Bench code is organized into two modules:
The core of LST-Bench is organized into two modules:

1. **Java Application.** This module is written entirely in Java and is responsible for executing SQL workloads against a system under test using JDBC.
It reads input configuration files to determine the tasks, sessions, and phases to be executed.
The Java application handles the execution of SQL statements and manages the interaction with the system under test.

2. **Python Processing Module.** The processing module is written in Python and serves as the post-execution analysis component.
2. **Python Metrics Module.** The metrics module is written in Python and serves as the post-execution analysis component.
It consolidates experimental results obtained from the Java application and computes metrics to provide insights into LSTs and cloud data warehouses.
The Python module performs data processing, analysis, and visualization to facilitate a deeper understanding of the experimental results.

Additionally, the **Adapters** module is designed to handle integration with external tools and systems by converting outputs from third-party benchmarks into formats compatible with LST-Bench.
One example of this is the **CAB to LST-Bench converter**, which transforms results from the Cloud Analytics Benchmark (CAB) into a format that can be used by LST-Bench for further analysis.

### LST-Bench Concepts
In LST-Bench, we utilize specific concepts to define and organize SQL workloads, with a focus on maximizing flexibility and facilitating reusability across various workloads. For detailed information, refer to our [documentation](docs/workloads.md).

Expand All @@ -123,7 +126,7 @@ The telemetry registry in LST-Bench is configurable, providing flexibility for d
By default, LST-Bench includes an implementation for a JDBC-based registry and supports writing telemetry to DuckDB or Spark.
LST-Bench writes these telemetry events into a table within the specified systems, enabling any application to consume and gain insights from the results.

Alternatively, if the LST-Bench [Metrics Processor](metrics) is used, you can simply point it to the same database.
Alternatively, if the LST-Bench [Metrics Processor](core/metrics) is used, you can simply point it to the same database.
The processor will then analyze and visualize the results, providing a streamlined solution for result analysis and visualization.

## Documentation
Expand Down
51 changes: 51 additions & 0 deletions adapters/cab-converter/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
<!--
{% comment %}
Copyright (c) Microsoft Corporation.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
{% endcomment %}
-->

# Cloud Analytics Benchmark (CAB) to LST-Bench Converter

This module converts output files from the Cloud Analytics Benchmark (CAB) to the format accepted by the LST-Bench framework.
The CAB repository is included as a Git submodule, and you can build it separately using the instructions provided in its own `README` file.

## Setup Instructions

### 1. Clone the LST-Bench Project with Submodules
To get the `cab-converter` project, you need to clone the main LST-Bench repository and ensure the CAB submodule is initialized.

Run the following command to clone the repository with all submodules:

```bash
git clone --recurse-submodules https://github.com/microsoft/lst-bench.git
```

If you've already cloned the repository without the submodules, you can initialize them manually by running:

```bash
cd adapters/cab-converter
git submodule update --init --recursive
```

This will pull in the CAB repository under the `cab` directory.

### 2. Build the CAB Project
The CAB project is a separate C++ application with its own build process. For platform-specific build instructions (Linux, macOS, Windows), refer to the CAB `README` located in the `cab` submodule.

### 3. Using the CAB to LST-Bench Converter
Once CAB is built and its output files are generated, you can run the `cab-converter` to transform those files into the format required by LST-Bench.

#### Running the Converter
_TODO: Add more details here._
1 change: 1 addition & 0 deletions adapters/cab-converter/cab
Submodule cab added at d23a4c
34 changes: 34 additions & 0 deletions adapters/cab-converter/pom.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
<groupId>com.microsoft.lst-bench</groupId>
<artifactId>lst-bench</artifactId>
<version>0.1-SNAPSHOT</version>
<relativePath>../../pom.xml</relativePath> <!-- Reference parent POM -->
</parent>

<artifactId>lst-bench-cab-converter</artifactId>
<name>LST-Bench Project CAB Converter</name>

<dependencies>
<dependency>
<groupId>com.fasterxml.jackson.core</groupId>
<artifactId>jackson-databind</artifactId>
<version>${jackson.version}</version>
</dependency>
</dependencies>

<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>${maven-compile-plugin.version}</version>
</plugin>
</plugins>
</build>
</project>
3 changes: 3 additions & 0 deletions adapters/cab-converter/target/maven-archiver/pom.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
artifactId=lst-bench-cab-converter
groupId=com.microsoft.lst-bench
version=0.1-SNAPSHOT
8 changes: 4 additions & 4 deletions metrics/app/README.md → core/metrics/app/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,11 +29,11 @@ The results displayed in the dashboard are specific to the versions and configur
Their performance is subject to change and improvement through further tuning and future developments.
Thus, the primary aim of sharing them is not to assert that one LST or engine is superior (in terms of speed, cost, etc.) to another.
Instead, it is to showcase LST-Bench's capability in quantifying significant trade-offs across various combinations of engines and LSTs.
Further details about the runs and setups are available [here](/run).
Further details about the runs and setups are available [here](/core/run).

## Adding a New Result
To include data from a new system, duplicate one of the directories in the [run folder](/run) and modify the necessary files within.
For a deeper understanding of the directory structure, consult the [README file](/run/README.md).
To include data from a new system, duplicate one of the directories in the [run folder](/core/run) and modify the necessary files within.
For a deeper understanding of the directory structure, consult the [README file](/core/run/README.md).
The LST-Bench dashboard web app automatically retrieves results from the .duckdb files within those folders and displays them on the dashboard.

Alternatively, you can provide your own paths to search for results via commandline arguments, see below.
Expand All @@ -58,7 +58,7 @@ source venv/bin/activate
```

### 3. Install Dependencies
Install the the necessary packages specified in the requirements.txt using pip:
Install the necessary packages specified in the requirements.txt using pip:

```bash
pip install -r requirements.txt
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Loading

0 comments on commit f2752fd

Please sign in to comment.