A dynamic benchmark for gene regulatory network (GRN) inference

Benchmarking GRN inference methods The full documentation is hosted on ReadTheDocs.

Path to source: src

README

Installation

You need to have Docker, Java, and Viash installed. Follow these instructions to install the required dependencies.

Download resources

git clone [email protected]:openproblems-bio/task_grn_inference.git

cd task_grn_inference

# download resources
scripts/download_resources.sh

Infer a GRN

viash run src/methods/dummy/config.vsh.yaml -- --multiomics_rna resources/grn-benchmark/multiomics_rna.h5ad --multiomics_atac resources/grn-benchmark/multiomics_atac.h5ad --prediction output/dummy.csv

Similarly, run the command for other methods.

Evaluate a GRN

scripts/benchmark_grn.sh --grn resources/grn-benchmark/models/collectri.csv

Similarly, run the command for other GRN models.

Add a method

To add a method to the repository, follow the instructions in the scripts/add_a_method.sh script.

Motivation

GRNs are essential for understanding cellular identity and behavior. They are simplified models of gene expression regulated by complex processes involving multiple layers of control, from transcription to post-transcriptional modifications, incorporating various regulatory elements and non-coding RNAs. Gene transcription is controlled by a regulatory complex that includes transcription factors (TFs), cis-regulatory elements (CREs) like promoters and enhancers, and essential co-factors. High-throughput datasets, covering thousands of genes, facilitate the use of machine learning approaches to decipher GRNs. The advent of single-cell sequencing technologies, such as scRNA-seq, has made it possible to infer GRNs from a single experiment due to the abundance of samples. This allows researchers to infer condition-specific GRNs, such as for different cell types or diseases, and study potential regulatory factors associated with these conditions. Combining chromatin accessibility data with gene expression measurements has led to the development of enhancer-driven GRN (eGRN) inference pipelines, which offer significantly improved accuracy over single-modality methods.

Description

Here, we present a dynamic benchmark platform for GRN inference. This platform provides curated datasets for GRN inference and evaluation, standardized evaluation protocols and metrics, computational infrastructure, and a dynamically updated leaderboard to track state-of-the-art methods. It runs novel GRNs in the cloud, offers competition scores, and stores them for future comparisons, reflecting new developments over time.

The platform supports the integration of new datasets and protocols. When a new feature is added, previously evaluated GRNs are re-assessed, and the leaderboard is updated accordingly. The aim is to evaluate both the accuracy and completeness of inferred GRNs. It is designed for both single-modality and multi-omics GRN inference. Ultimately, it is a community-driven platform. So far, six eGRN inference methods have been integrated: Scenic+, CellOracle, FigR, scGLUE, GRaNIE, and ANANSE.

Due to its flexible nature, the platform can incorporate various benchmark datasets and evaluation methods, using either prior knowledge or feature-based approaches. In the current version, due to the absence of standardized prior knowledge, we use a feature-based approach to benchmark GRNs. Our evaluation utilizes standardized datasets for GRN inference and evaluation, employing multiple regression analysis approaches to assess both accuracy and comprehensiveness.

Authors & contributors

name	roles
Jalil Nourisa	author
Robrecht Cannoodt	author
Antoine Passimier	contributor
Christian Arnold	contributor
Marco Stock	contributor

API

flowchart LR
  file_multiomics_rna_h5ad("multiomics rna")
  comp_method[/"Method"/]
  file_prediction("GRN")
  comp_metric[/"Label"/]
  file_score("Score")
  file_multiomics_atac_h5ad("multiomics atac")
  file_perturbation_h5ad("perturbation")
  comp_control_method[/"Control Method"/]
  comp_method_r[/"Method r"/]
  file_multiomics_rna_h5ad---comp_method
  comp_method-->file_prediction
  file_prediction---comp_metric
  comp_metric-->file_score
  file_multiomics_atac_h5ad---comp_method
  file_perturbation_h5ad---comp_metric
  comp_control_method-->file_prediction
  comp_method_r-->file_prediction

File format: multiomics rna

RNA expression for multiomics data.

Example file: resources_test/grn-benchmark/multiomics_rna.h5ad

Format:

AnnData object
 obs: 'cell_type', 'donor_id'

Slot description:

Slot	Type	Description
`obs["cell_type"]`	`string`	The annotated cell type of each cell based on RNA expression.
`obs["donor_id"]`	`string`	Donor id.

Component type: Method

Path: src/methods

A GRN inference method

Arguments:

Name	Type	Description
`--multiomics_rna`	`file`	(Optional) RNA expression for multiomics data. Default: `resources/grn-benchmark/multiomics_rna.h5ad`.
`--multiomics_atac`	`file`	(Optional) Peak data for multiomics data. Default: `resources/grn-benchmark/multiomics_atac.h5ad`.
`--prediction`	`file`	(Optional, Output) GRN prediction. Default: `output/prediction.csv`.
`--temp_dir`	`string`	(Optional) NA. Default: `output/temdir`.
`--num_workers`	`integer`	(Optional) NA. Default: `4`.
`--tf_all`	`file`	(Optional) NA. Default: `resources/prior/tf_all.csv`.
`--max_n_links`	`integer`	(Optional) NA. Default: `50000`.

File format: GRN

GRN prediction

Example file: resources_test/grn_models/collectri.csv

Format:

Tabular data
 'source', 'target', 'weight'

Slot description:

Column	Type	Description
`source`	`string`	Source of regulation.
`target`	`string`	Target of regulation.
`weight`	`float`	Weight of regulation.

Component type: Label

Path: src/metrics

A metric to evaluate the performance of the inferred GRN

Arguments:

Name	Type	Description
`--perturbation_data`	`file`	(Optional) Perturbation dataset for benchmarking. Default: `resources/grn-benchmark/perturbation_data.h5ad`.
`--prediction`	`file`	GRN prediction.
`--score`	`file`	(Optional, Output) File indicating the score of a metric. Default: `output/score.h5ad`.
`--reg_type`	`string`	(Optional) name of regretion to use. Default: `ridge`.
`--subsample`	`integer`	(Optional) number of samples randomly drawn from perturbation data. Default: `-2`.
`--max_workers`	`integer`	(Optional) NA. Default: `4`.
`--method_id`	`string`	(Optional) NA.
`--tf_all`	`file`	(Optional) NA. Default: `resources/prior/tf_all.csv`.
`--apply_tf`	`boolean`	(Optional) NA. Default: `TRUE`.
`--clip_scores`	`boolean`	(Optional) clips the r2 scores for each gene to make them within [0, 1]. Default: `TRUE`.

File format: Score

File indicating the score of a metric.

Example file: resources_test/scores/score.h5ad

Format:

AnnData object
 uns: 'dataset_id', 'method_id', 'metric_ids', 'metric_values'

Slot description:

Slot	Type	Description
`uns["dataset_id"]`	`string`	A unique identifier for the dataset.
`uns["method_id"]`	`string`	A unique identifier for the method.
`uns["metric_ids"]`	`string`	One or more unique metric identifiers.
`uns["metric_values"]`	`double`	The metric values obtained for the given prediction. Must be of same length as ‘metric_ids’.

File format: multiomics atac

Peak data for multiomics data.

Example file: resources_test/grn-benchmark/multiomics_atac.h5ad

Format:

AnnData object
 obs: 'cell_type', 'donor_id'

Slot description:

Slot	Type	Description
`obs["cell_type"]`	`string`	The annotated cell type of each cell based on RNA expression.
`obs["donor_id"]`	`string`	Donor id.

File format: perturbation

Perturbation dataset for benchmarking.

Example file: resources_test/grn-benchmark/perturbation_data.h5ad

Format:

AnnData object
 obs: 'cell_type', 'sm_name', 'donor_id', 'plate_name', 'row', 'well', 'cell_count'
 layers: 'n_counts', 'pearson', 'lognorm'

Slot description:

Slot	Type	Description
`obs["cell_type"]`	`string`	The annotated cell type of each cell based on RNA expression.
`obs["sm_name"]`	`string`	The primary name for the (parent) compound (in a standardized representation) as chosen by LINCS. This is provided to map the data in this experiment to the LINCS Connectivity Map data.
`obs["donor_id"]`	`string`	Donor id.
`obs["plate_name"]`	`string`	Plate name 6 levels.
`obs["row"]`	`string`	Row name on the plate.
`obs["well"]`	`string`	Well name on the plate.
`obs["cell_count"]`	`string`	Number of single cells pseudobulked.
`layers["n_counts"]`	`double`	Pseudobulked values using mean approach.
`layers["pearson"]`	`double`	(Optional) Normalized values using pearson residuals.
`layers["lognorm"]`	`double`	(Optional) Normalized values using shifted logarithm .

Component type: Control Method

Path: src/control_methods

A control method.

Arguments:

Name	Type	Description
`--layer`	`string`	(Optional) Which layer of pertubation data to use to find tf-gene relationships. Default: `scgen_pearson`.
`--prediction`	`file`	(Optional, Output) GRN prediction.
`--tf_all`	`file`	NA.

Component type: Method r

Path: src/methods_r

A GRN inference method

Arguments:

Name	Type	Description
`--multiomics_rna_r`	`file`	(Optional) NA.
`--multiomics_atac_r`	`file`	(Optional) NA.
`--prediction`	`file`	(Optional, Output) GRN prediction.
`--temp_dir`	`string`	(Optional) NA. Default: `output/temdir`.
`--num_workers`	`integer`	(Optional) NA. Default: `4`.

Name		Name	Last commit message	Last commit date
Latest commit History 350 Commits
.github/workflows		.github/workflows
dockerfiles		dockerfiles
docs		docs
scripts		scripts
src		src
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
README.md		README.md
_viash.yaml		_viash.yaml
main.nf		main.nf
runs.ipynb		runs.ipynb
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

A dynamic benchmark for gene regulatory network (GRN) inference

README

Installation

Download resources

Infer a GRN

Evaluate a GRN

Add a method

Motivation

Description

Authors & contributors

API

File format: multiomics rna

Component type: Method

File format: GRN

Component type: Label

File format: Score

File format: multiomics atac

File format: perturbation

Component type: Control Method

Component type: Method r

About

Releases

Packages

Contributors 6

Languages

openproblems-bio/task_grn_inference

Folders and files

Latest commit

History

Repository files navigation

A dynamic benchmark for gene regulatory network (GRN) inference

README

Installation

Download resources

Infer a GRN

Evaluate a GRN

Add a method

Motivation

Description

Authors & contributors

API

File format: multiomics rna

Component type: Method

File format: GRN

Component type: Label

File format: Score

File format: multiomics atac

File format: perturbation

Component type: Control Method

Component type: Method r

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 6

Languages

Packages