doc: Results parallel (#77)

* google analytics setup * doc: parallel results
biodatageeks · Jan 18, 2025 · 87c8892 · 87c8892
1 parent 8d0d8be
commit 87c8892
Show file tree

Hide file tree

Showing 5 changed files with 26 additions and 9 deletions.
diff --git a/README.md b/README.md
@@ -13,9 +13,16 @@
 [polars-bio](https://pypi.org/project/polars-bio/) is a Python library for genomics built on top of [polars](https://pola.rs/), [Apache Arrow](https://arrow.apache.org/) and [Apache DataFusion](https://datafusion.apache.org/).
 It provides a DataFrame API for genomics data and is designed to be blazing fast, memory efficient and easy to use.
 
+
+## Single-thread performance 🏃‍
 ![overlap-single.png](docs/assets/overlap-single.png)
 
-![nearest-single.png](docs/assets/nearest-single.png)
+![overlap-single.png](docs/assets/nearest-single.png)
+
+## Parallel performance 🏃‍🏃‍
+![overlap-parallel.png](docs/assets/overlap-parallel.png)
+
+![overlap-parallel.png](docs/assets/nearest-parallel.png)
 ## Key Features
 * optimized for [peformance](https://biodatageeks.org/polars-bio/performance/) and large-scale genomics datasets
 * popular genomics [operations](https://biodatageeks.org/polars-bio/features/#genomic-ranges-operations) with a DataFrame API (both [Pandas](https://pandas.pydata.org/) and [polars](https://pola.rs/))

diff --git a/docs/assets/nearest-parallel.png b/docs/assets/nearest-parallel.png
diff --git a/docs/assets/overlap-parallel.png b/docs/assets/overlap-parallel.png
diff --git a/docs/index.md b/docs/index.md
@@ -7,6 +7,17 @@ polars-bio is a :rocket:blazing [fast](performance.md#results-summary-) Python D
 and  [polars](https://pola.rs/).
 It is designed to be easy to use, fast and memory efficient with a focus on genomics data.
 
+## Single-thread performance 🏃‍
+![overlap-single.png](assets/overlap-single.png)
+
+![overlap-single.png](assets/nearest-single.png)
+
+## Parallel performance 🏃‍🏃‍
+![overlap-parallel.png](assets/overlap-parallel.png)
+
+![overlap-parallel.png](assets/nearest-parallel.png)
+
+
 ## Key Features
 * optimized for [peformance](performance.md#results-summary-) and large-scale genomics datasets
 * popular genomics [operations](features.md#genomic-ranges-operations) with a DataFrame API (both [Pandas](https://pandas.pydata.org/) and [polars](https://pola.rs/))

diff --git a/docs/performance.md b/docs/performance.md
@@ -1,15 +1,14 @@
 # Results summary 📈
 
-
-!!! todo
-    - Add summary of the results
-
-## Single-threaded performance 🏃‍
+## Single-thread performance 🏃‍
 ![overlap-single.png](assets/overlap-single.png)
 
 ![overlap-single.png](assets/nearest-single.png)
 
-## Parallel performance 🏃‍🏃‍🏃‍
+## Parallel performance 🏃‍🏃‍
+![overlap-parallel.png](assets/overlap-parallel.png)
+
+![overlap-parallel.png](assets/nearest-parallel.png)
 ## Benchmarks 🧪
 ### Detailed results shortcuts 👨‍🔬
 - [Binary operations](#binary-operations)
@@ -35,7 +34,7 @@
 !!! note
     Test dataset in *Parquet* format can be downloaded from:
 
-    * for [single-threaded](https://drive.google.com/file/d/1lctmude31mSAh9fWjI60K1bDrbeDPGfm/view?usp=sharing) tests
+    * for [single-thread](https://drive.google.com/file/d/1lctmude31mSAh9fWjI60K1bDrbeDPGfm/view?usp=sharing) tests
     * for [parallel](https://drive.google.com/file/d/1Sj7nTB5gCUq9nbeQOg4zzS4tKO37M5Nd/view?usp=sharing) tests (8 partitions per dataset)
 
 ### Test libraries 📚
@@ -720,7 +719,7 @@ pb.ctx.set_option("datafusion.optimizer.repartition_joins", "true")
 pb.ctx.set_option("datafusion.optimizer.repartition_file_scans", "true")
 pb.ctx.set_option("datafusion.execution.coalesce_batches", "false")
 ```
-the `single-threaded` dataset was used (see [Test datasets](#test-datasets))
+the `single-thread` dataset was used (see [Test datasets](#test-datasets))
 
 
 - `polars_bio-n-p`: Custom partitioning schema (constant number of 8 partitions/dataset) without any repartitioning in DataFusion: