Skip to content

Commit

Permalink
Wrap up paper.
Browse files Browse the repository at this point in the history
  • Loading branch information
willgryan committed Oct 3, 2023
1 parent 315e42a commit 1335fce
Show file tree
Hide file tree
Showing 2 changed files with 72 additions and 4 deletions.
68 changes: 68 additions & 0 deletions joss/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -449,3 +449,71 @@ @software{william_ryan_2023_8156248
url = {https://doi.org/10.5281/zenodo.8156248}
}

@article{RN68,
author = {Ashburner, M. and Ball, C. A. and Blake, J. A. and Botstein, D. and Butler, H. and Cherry, J. M. and Davis, A. P. and Dolinski, K. and Dwight, S. S. and Eppig, J. T. and Harris, M. A. and Hill, D. P. and Issel-Tarver, L. and Kasarskis, A. and Lewis, S. and Matese, J. C. and Richardson, J. E. and Ringwald, M. and Rubin, G. M. and Sherlock, G.},
title = {Gene ontology: tool for the unification of biology. The Gene Ontology Consortium},
journal = {Nat Genet},
volume = {25},
number = {1},
pages = {25-9},
note = {1546-1718
Ashburner, M
Ball, C A
Blake, J A
Botstein, D
Butler, H
Cherry, J M
Davis, A P
Dolinski, K
Dwight, S S
Eppig, J T
Harris, M A
Hill, D P
Issel-Tarver, L
Kasarskis, A
Lewis, S
Matese, J C
Richardson, J E
Ringwald, M
Rubin, G M
Sherlock, G
U41 HG001315/HG/NHGRI NIH HHS/United States
HD33745/HD/NICHD NIH HHS/United States
P41 HG000330/HG/NHGRI NIH HHS/United States
U41 HG000739/HG/NHGRI NIH HHS/United States
P41 HG001315/HG/NHGRI NIH HHS/United States
R01 HD033745/HD/NICHD NIH HHS/United States
P41 HG000739-19/HG/NHGRI NIH HHS/United States
P41 HG001315-16/HG/NHGRI NIH HHS/United States
P41 HG00330/HG/NHGRI NIH HHS/United States
R01 HD033745-11/HD/NICHD NIH HHS/United States
P41 HG000330-22/HG/NHGRI NIH HHS/United States
P41 HG01315/HG/NHGRI NIH HHS/United States
P41 HG000739/HG/NHGRI NIH HHS/United States
Journal Article
Research Support, Non-U.S. Gov't
Research Support, U.S. Gov't, P.H.S.
United States
2000/05/10
Nat Genet. 2000 May;25(1):25-9. doi: 10.1038/75556.},
abstract = {Genomic sequencing has made it clear that a large fraction of the genes specifying the core biological functions are shared by all eukaryotes. Knowledge of the biological role of such shared proteins in one organism can often be transferred to other organisms. The goal of the Gene Ontology Consortium is to produce a dynamic, controlled vocabulary that can be applied to all eukaryotes even as knowledge of gene and protein roles in cells is accumulating and changing. To this end, three independent ontologies accessible on the World-Wide Web (http://www.geneontology.org) are being constructed: biological process, molecular function and cellular component.},
keywords = {Animals
Computer Communication Networks
Databases, Factual
Eukaryotic Cells/*physiology
*Genes
Humans
Metaphysics
Mice
Molecular Biology/*trends
*Sequence Analysis, DNA
*Terminology as Topic},
ISSN = {1061-4036 (Print)
1061-4036},
DOI = {10.1038/75556},
year = {2000},
type = {Journal Article}
}



8 changes: 4 additions & 4 deletions joss/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,15 +46,15 @@ Multiomics is used extensively in biological research today. However, the develo

# Overview

PAVER uses vector embeddings to help interpret pathway analyses. Embeddings encode the meaning of pathways into numerical representations which can then be clustered and visualized (\autoref{fig:overview}). To identify which pathway is most representative of a cluster, PAVER first takes the average embedding of all pathways in a cluster to capture it's overall meaning [@RN49]. It then finds which pathway is most similar to the average embedding and labels the cluster with that pathway. This allows PAVER to automatically curate long lists of pathways into groups and identify which pathway is most representative of each group.
PAVER uses vector embeddings to help interpret pathway analyses. Embeddings encode the meaning of pathways into numerical representations which can then be clustered and visualized (\autoref{fig:overview}). To identify which pathway is most representative of a cluster, PAVER first takes the average embedding of all pathways in a cluster to capture it's overall meaning into a single numerical representation [@RN49]. It then finds which pathway is most similar to the average embedding and labels the cluster with that pathway. This allows PAVER to automatically curate long lists of pathways into groups and identify which pathway is most representative of each group.

![PAVER uses numerical representations of pathways to find functionally related clusters.\label{fig:overview}](figures/overview.png)

PAVER was designed to be easy to use by researchers and students with minimal coding experience. PAVER has already been using in a number of scientific publications to aid in the intepretation of pathway analyses [@william_ryan_2023_8156248; @RN78]. We have pre-computed vector representations for Gene Ontology using the recent anc2vec model [@RN13], available here: https://github.com/willgryan/PAVER_embeddings. However, embeddings for any pathway database can be used with PAVER.
PAVER was designed to be easy to use by researchers and students with minimal coding experience. PAVER has already been using in a number of scientific publications to aid in the intepretation of pathway analyses [@william_ryan_2023_8156248; @RN78]. We have pre-computed vector representations for Gene Ontology [@RN68] using the recent anc2vec model [@RN13], available here: https://github.com/willgryan/PAVER_embeddings. However, embeddings for any pathway database can be used with PAVER.

# Licensing and Availability
# Licensing, Availability and Usage

The PAVER R package is licensed under the GNU General Public License v3.0. It can be installed using remotes::install_github("willgryan/PAVER"). All code, including a vignette with an example dataset, is open-source and hosted on GitHub. Report bugs using the issue tracker at https://github.com/willgryan/PAVER/issues/.
The PAVER R package is licensed under the GNU General Public License v3.0. It can be installed using remotes::install_github("willgryan/PAVER"). All code, including an instructional vignette with an example dataset, is open-source and hosted on GitHub. Report bugs using the issue tracker at https://github.com/willgryan/PAVER/issues/.

# Acknowledgements

Expand Down

0 comments on commit 1335fce

Please sign in to comment.