Skip to content

Commit

Permalink
docs: edits to the joss paper
Browse files Browse the repository at this point in the history
  • Loading branch information
AliSajid committed Nov 8, 2023
1 parent 06102e7 commit ce059e8
Showing 1 changed file with 5 additions and 5 deletions.
10 changes: 5 additions & 5 deletions joss/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,23 +42,23 @@ affiliations:

# Summary

Omics studies look at large amounts of biological data to understand changes that cause different traits or conditions in living things. However, analyzing omics data often results in long, hard-to-understand lists of pathways. PAVER is an R package that automatically curates similar pathways into groups, identifies the most representative pathway of each group, and provides publication-ready intuitive visualizations. PAVER makes it easy to integrate multiple pathway analyses, discover relevant biological insights and can work with any pathway database.
Omics studies look at large amounts of biological data to understand changes underlying different traits or conditions in living things. However, analyzing omics data often results in long, hard-to-understand lists of pathways. PAVER is an R package that automatically curates similar pathways into groups, identifies the most representative pathway of each group, and provides publication-ready intuitive visualizations. PAVER makes it easy to integrate multiple pathway analyses, discover relevant biological insights and work with any pathway database.

# Statement of Need

Omics is used extensively in biological research today. However, the development of omics technologies has vastly outpaced the expertise of researchers in its analysis, and the resulting “data deluge” now overwhelms the capacity of human cognition [@RN16; @RN20; @RN19]. Analysis of omics data is therefore the major bottleneck in most research projects today and its use in precision medicine remains limited [@RN26; @RN63]. Pathway analysis has since become ubiquitous to help interpret omics data and elucidate mechanisms of biological phenomena under study [@RN6]. Despite the last decade bringing a host of different computational tools to perform pathway analysis, they each generally result in lists of results too long to manually inspect and extract relevant targets for downstream wet lab validation without introducing biases [@RN5; @RN81]. Interpretation of results is therefore the greatest expense in any omics project [@RN21]. With the total volume of omics data continuing to grow, novel ways of data management are needed [@RN22]. FAIR (Findable, Accessible, Interoperable, Reusable) scientific data principles necessitate automated interpretation of omics results [@RN25].
Omics is used extensively in biological research today. However, the development of omics technologies has vastly outpaced the expertise of researchers in its analysis, and the resulting “data deluge” now overwhelms the capacity of human cognition [@RN16; @RN20; @RN19]. Analysis of omics data is therefore a major bottleneck in most research projects today and its use in precision medicine remains limited for this reason[@RN26; @RN63]. Pathway analysis has become ubiquitous in helping interpret omics data and elucidate mechanisms of biological phenomena under study [@RN6]. Despite the last decade bringing a host of different computational tools to perform pathway analysis, they generally result in lists of results too long to manually inspect and extract relevant targets for downstream validation without introducing biases [@RN5; @RN81]. Interpretation of results is therefore the greatest time cost in any omics project [@RN21]. With the total volume of omics data continuing to grow, novel ways of data management are needed [@RN22]. FAIR (Findable, Accessible, Interoperable, Reusable) scientific data principles necessitate automated interpretation of omics results [@RN25].

# Overview

PAVER uses vector embeddings to help interpret pathway analyses. Embeddings encode the meaning of pathways into numerical representations which can then be hierarchically clustered and visualized (\autoref{fig:overview}). To identify which pathway is most representative of a cluster, PAVER first takes the average embedding of all pathways in a cluster to capture it's overall meaning into a single numerical representation [@RN49]. It then finds which pathway is most similar to the average embedding and labels the cluster with that pathway. This allows PAVER to automatically curate long lists of pathways into groups and identify which pathway is most representative of each group. PAVER assumes the pathway analysis was properly performed [@9tips].
PAVER uses vector embeddings to help interpret pathway analyses. Embeddings encode the meaning of pathways into numerical representations which can then be hierarchically clustered and visualized (\autoref{fig:overview}). To identify which pathway is most representative of a cluster, PAVER first takes the average embedding of all pathways in a cluster to capture it's overall meaning into a single numerical representation [@RN49]. It then finds which pathway is most similar to the average embedding and labels the cluster with that pathway. This allows PAVER to automatically curate long lists of pathways into related groups and identify the pathway most representative of each group. PAVER assumes the pathway analysis was properly performed [@9tips].

![PAVER uses numerical representations of pathways to find functionally related clusters.\label{fig:overview}](figures/overview.png)

PAVER was designed to be easy to use by researchers with minimal coding experience. PAVER has already been used in a number of scientific publications to aid in the interpretation of pathway analyses [@william_ryan_2023_8156248; @RN78]. We have pre-computed vector representations for Gene Ontology [@RN68] using the recent anc2vec model [@RN13], available here: https://github.com/willgryan/PAVER_embeddings. However, embeddings for any pathway database or ontology can be used with PAVER.
PAVER was designed to be easy to use by researchers with minimal programming experience. PAVER has already been used in a number of scientific publications to aid in the interpretation of pathway analyses [@william_ryan_2023_8156248; @RN78]. We have pre-computed vector representations for Gene Ontology [@RN68] using the recent anc2vec model [@RN13], available here: https://github.com/willgryan/PAVER_embeddings. However, embeddings for any pathway database or ontology can be used with PAVER.

# Licensing, Availability and Usage

The PAVER R package is licensed under the GNU General Public License v3.0. It can be installed using remotes::install_github("willgryan/PAVER"). All code, including an instructional vignette with example data, is open-source and hosted on GitHub. Report bugs using the issue tracker at https://github.com/willgryan/PAVER/issues/.
The PAVER R package is licensed under the GNU General Public License v3.0. It can be installed using `remotes::install_github("willgryan/PAVER")`. All code, including an instructional vignette with example data, is open-source and hosted on GitHub. Bugs and feature requests can be made using the issue tracker at [https://github.com/willgryan/PAVER/issues/](https://github.com/willgryan/PAVER/issues/).

# Acknowledgements

Expand Down

0 comments on commit ce059e8

Please sign in to comment.