Skip to content

Commit

Permalink
Use mock data in tests
Browse files Browse the repository at this point in the history
Wrap up vignette and paper
  • Loading branch information
willgryan committed Oct 6, 2023
1 parent 08da313 commit 32f0d56
Show file tree
Hide file tree
Showing 11 changed files with 232 additions and 114 deletions.
4 changes: 2 additions & 2 deletions R/PAVER_theme_plot.R
Original file line number Diff line number Diff line change
Expand Up @@ -14,9 +14,9 @@
PAVER_theme_plot <- function(PAVER_result) {

plot = PAVER_result$umap$layout %>%
tibble::as_tibble(rownames = NA) %>%
tibble::as_tibble(rownames = NA, .name_repair = "universal") %>%
tibble::rownames_to_column("UniqueID") %>%
dplyr::rename(UMAP1 = "V1", UMAP2 = "V2") %>%
dplyr::rename_with(.cols = 2:3, ~ c("UMAP1", "UMAP2")) %>%
dplyr::inner_join(PAVER_result$clustering %>%
dplyr::select(.data$UniqueID, .data$Group, .data$Cluster), by = "UniqueID") %>%
ggplot2::ggplot(ggplot2::aes(x = .data$UMAP1,
Expand Down
10 changes: 7 additions & 3 deletions joss/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,15 +38,15 @@ affiliations:

# Summary

Omics experiments are commonly used to predict changes in pathways underlying phenotypes. However, the results of these experiments are often long lists of pathways that are difficult to interpret. PAVER is an R package that automatically curates long lists of pathways into groups, identifies which pathway is most representative of each group, and provides publication-ready intuitive visualizations. PAVER makes it easy to integrate multiple pathway analyses, identify relevant biological insights and can work with any pathway database.
Omics studies are commonly used to predict changes in biological pathways underlying phenotypes. However, the results of omics experiments can be long lists of pathways that are difficult to interpret. PAVER is an R package that automatically curates long lists of pathways into groups, identifies which pathway is most representative of each group, and provides publication-ready intuitive visualizations. PAVER makes it easy to integrate multiple pathway analyses, identify relevant biological insights and can work with any pathway database.

# Statement of Need

Multiomics is used extensively in biological research today. However, the development of omics technologies has vastly outpaced the expertise of researchers in its analysis, and the resulting “data deluge” now overwhelms the capacity of human cognition [@RN16; @RN20; @RN19]. Analysis of omics data is therefore the major bottleneck in most research projects today and its use in precision medicine remains limited accordingly [@RN26; @RN63]. Pathway analysis has since become ubiquitous to help interpret omics data and elucidate mechanisms of biological phenomena under study [@RN6]. Despite the last decade bringing a host of different computational tools to perform pathway analysis, they each generally result in lists of results too long to manually inspect and extract relevant targets for downstream wet lab validation without introducing biases [@RN5; @RN81]. Interpretation of results is accordingly the greatest expense in any omics project [@RN21]. With the total volume of omics data continuing to grow, novel ways of data management are needed [@RN22]. FAIR (Findable, Accessible, Interoperable, Reusable) scientific data principles necessitate automated interpretation of omics results [@RN25].
Omics is used extensively in biological research today. However, the development of omics technologies has vastly outpaced the expertise of researchers in its analysis, and the resulting “data deluge” now overwhelms the capacity of human cognition [@RN16; @RN20; @RN19]. Analysis of omics data is therefore the major bottleneck in most research projects today and its use in precision medicine remains limited accordingly [@RN26; @RN63]. Pathway analysis has since become ubiquitous to help interpret omics data and elucidate mechanisms of biological phenomena under study [@RN6]. Despite the last decade bringing a host of different computational tools to perform pathway analysis, they each generally result in lists of results too long to manually inspect and extract relevant targets for downstream wet lab validation without introducing biases [@RN5; @RN81]. Interpretation of results is accordingly the greatest expense in any omics project [@RN21]. With the total volume of omics data continuing to grow, novel ways of data management are needed [@RN22]. FAIR (Findable, Accessible, Interoperable, Reusable) scientific data principles necessitate automated interpretation of omics results [@RN25].

# Overview

PAVER uses vector embeddings to help interpret pathway analyses. Embeddings encode the meaning of pathways into numerical representations which can then be clustered and visualized (\autoref{fig:overview}). To identify which pathway is most representative of a cluster, PAVER first takes the average embedding of all pathways in a cluster to capture it's overall meaning into a single numerical representation [@RN49]. It then finds which pathway is most similar to the average embedding and labels the cluster with that pathway. This allows PAVER to automatically curate long lists of pathways into groups and identify which pathway is most representative of each group.
PAVER uses vector embeddings to help interpret pathway analyses. Embeddings encode the meaning of pathways into numerical representations which can then be hierarchically clustered and visualized (\autoref{fig:overview}). To identify which pathway is most representative of a cluster, PAVER first takes the average embedding of all pathways in a cluster to capture it's overall meaning into a single numerical representation [@RN49]. It then finds which pathway is most similar to the average embedding and labels the cluster with that pathway. This allows PAVER to automatically curate long lists of pathways into groups and identify which pathway is most representative of each group.

![PAVER uses numerical representations of pathways to find functionally related clusters.\label{fig:overview}](figures/overview.png)

Expand All @@ -60,4 +60,8 @@ The PAVER R package is licensed under the GNU General Public License v3.0. It ca

This work was supported by NIH T32-G-RISE grant number 1T32GM144873-01.

# Disclosure

The authors declare no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

# References
35 changes: 24 additions & 11 deletions tests/testthat/test-PAVER_combined_plot.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,32 @@ library(ggpubr)

test_that("PAVER_combined_plot works correctly", {

#Use vignette example data
input = gsea_example

embeddings = readRDS(url("https://github.com/willgryan/PAVER_embeddings/raw/main/2023-03-06/embeddings_2023-03-06.RDS"))

term2name = readRDS(url("https://github.com/willgryan/PAVER_embeddings/raw/main/2023-03-06/term2name_2023-03-06.RDS"))

PAVER_result = prepare_data(input, embeddings, term2name)

PAVER_result <- generate_themes(PAVER_result, minClusterSize = 40)
#Mock input data
mock_input <- data.frame(
GOID = paste0("GO:", sprintf("%07d", 1:250)),
GroupA = rnorm(250),
GroupB = rnorm(250),
GroupC = rnorm(250)
)

# Mock embeddings data
mock_embeddings <- matrix(rnorm(250 * 10), 250, 10)
rownames(mock_embeddings) <- paste0("GO:", sprintf("%07d", 1:250))

# Mock term2name data
mock_term2name <- data.frame(
GOID = paste0("GO:", sprintf("%07d", 1:250)),
TermName = paste0("Term ", 1:250)
)

# Generate the mock PAVER_result using the prepare_data function
mock_PAVER_result <- prepare_data(mock_input, mock_embeddings, mock_term2name)

# Run the generate_themes function with the mock PAVER_result
result <- generate_themes(mock_PAVER_result)

# Run the function and catch the result
p <- PAVER_combined_plot(PAVER_result)
p <- PAVER_combined_plot(result)

# Verify the function runs and produces a ggplot object
expect_s3_class(p, "gg")
Expand Down
35 changes: 24 additions & 11 deletions tests/testthat/test-PAVER_export.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,19 +5,32 @@ library(tibble)

test_that("PAVER_export works correctly", {

#Use vignette example data
input = gsea_example

embeddings = readRDS(url("https://github.com/willgryan/PAVER_embeddings/raw/main/2023-03-06/embeddings_2023-03-06.RDS"))

term2name = readRDS(url("https://github.com/willgryan/PAVER_embeddings/raw/main/2023-03-06/term2name_2023-03-06.RDS"))

PAVER_result = prepare_data(input, embeddings, term2name)

PAVER_result <- generate_themes(PAVER_result, minClusterSize = 40)
#Mock input data
mock_input <- data.frame(
GOID = paste0("GO:", sprintf("%07d", 1:250)),
GroupA = rnorm(250),
GroupB = rnorm(250),
GroupC = rnorm(250)
)

# Mock embeddings data
mock_embeddings <- matrix(rnorm(250 * 10), 250, 10)
rownames(mock_embeddings) <- paste0("GO:", sprintf("%07d", 1:250))

# Mock term2name data
mock_term2name <- data.frame(
GOID = paste0("GO:", sprintf("%07d", 1:250)),
TermName = paste0("Term ", 1:250)
)

# Generate the mock PAVER_result using the prepare_data function
mock_PAVER_result <- prepare_data(mock_input, mock_embeddings, mock_term2name)

# Run the generate_themes function with the mock PAVER_result
result <- generate_themes(mock_PAVER_result)

# Test PAVER_export function
export_result <- PAVER_export(PAVER_result)
export_result <- PAVER_export(result)

# Verify the structure and content of the output
expect_s3_class(export_result, "tbl_df")
Expand Down
35 changes: 24 additions & 11 deletions tests/testthat/test-PAVER_hunter_plot.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,32 @@ library(ggpubr)

test_that("PAVER_hunter_plot works correctly", {

#Use vignette example data
input = gsea_example

embeddings = readRDS(url("https://github.com/willgryan/PAVER_embeddings/raw/main/2023-03-06/embeddings_2023-03-06.RDS"))

term2name = readRDS(url("https://github.com/willgryan/PAVER_embeddings/raw/main/2023-03-06/term2name_2023-03-06.RDS"))

PAVER_result = prepare_data(input, embeddings, term2name)

PAVER_result <- generate_themes(PAVER_result, minClusterSize = 40)
#Mock input data
mock_input <- data.frame(
GOID = paste0("GO:", sprintf("%07d", 1:250)),
GroupA = rnorm(250),
GroupB = rnorm(250),
GroupC = rnorm(250)
)

# Mock embeddings data
mock_embeddings <- matrix(rnorm(250 * 10), 250, 10)
rownames(mock_embeddings) <- paste0("GO:", sprintf("%07d", 1:250))

# Mock term2name data
mock_term2name <- data.frame(
GOID = paste0("GO:", sprintf("%07d", 1:250)),
TermName = paste0("Term ", 1:250)
)

# Generate the mock PAVER_result using the prepare_data function
mock_PAVER_result <- prepare_data(mock_input, mock_embeddings, mock_term2name)

# Run the generate_themes function with the mock PAVER_result
result <- generate_themes(mock_PAVER_result)

# Run the function and catch the result
p <- PAVER_hunter_plot(PAVER_result)
p <- PAVER_hunter_plot(result)

# Verify the function runs and produces a ggplot object
expect_s4_class(p, "HeatmapList")
Expand Down
35 changes: 24 additions & 11 deletions tests/testthat/test-PAVER_interpretation_plot.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,32 @@ library(ggpubr)

test_that("PAVER_interpretation_plot works correctly", {

#Use vignette example data
input = gsea_example

embeddings = readRDS(url("https://github.com/willgryan/PAVER_embeddings/raw/main/2023-03-06/embeddings_2023-03-06.RDS"))

term2name = readRDS(url("https://github.com/willgryan/PAVER_embeddings/raw/main/2023-03-06/term2name_2023-03-06.RDS"))

PAVER_result = prepare_data(input, embeddings, term2name)

PAVER_result <- generate_themes(PAVER_result, minClusterSize = 40)
#Mock input data
mock_input <- data.frame(
GOID = paste0("GO:", sprintf("%07d", 1:250)),
GroupA = rnorm(250),
GroupB = rnorm(250),
GroupC = rnorm(250)
)

# Mock embeddings data
mock_embeddings <- matrix(rnorm(250 * 10), 250, 10)
rownames(mock_embeddings) <- paste0("GO:", sprintf("%07d", 1:250))

# Mock term2name data
mock_term2name <- data.frame(
GOID = paste0("GO:", sprintf("%07d", 1:250)),
TermName = paste0("Term ", 1:250)
)

# Generate the mock PAVER_result using the prepare_data function
mock_PAVER_result <- prepare_data(mock_input, mock_embeddings, mock_term2name)

# Run the generate_themes function with the mock PAVER_result
result <- generate_themes(mock_PAVER_result)

# Run the function and catch the result
p <- PAVER_interpretation_plot(PAVER_result)
p <- PAVER_interpretation_plot(result)

# Verify the function runs and produces a ggplot object
expect_s3_class(p, "gg")
Expand Down
35 changes: 24 additions & 11 deletions tests/testthat/test-PAVER_regulation_plot.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,32 @@ library(ggpubr)

test_that("PAVER_regulation_plot works correctly", {

#Use vignette example data
input = gsea_example

embeddings = readRDS(url("https://github.com/willgryan/PAVER_embeddings/raw/main/2023-03-06/embeddings_2023-03-06.RDS"))

term2name = readRDS(url("https://github.com/willgryan/PAVER_embeddings/raw/main/2023-03-06/term2name_2023-03-06.RDS"))

PAVER_result = prepare_data(input, embeddings, term2name)

PAVER_result <- generate_themes(PAVER_result, minClusterSize = 40)
#Mock input data
mock_input <- data.frame(
GOID = paste0("GO:", sprintf("%07d", 1:250)),
GroupA = rnorm(250),
GroupB = rnorm(250),
GroupC = rnorm(250)
)

# Mock embeddings data
mock_embeddings <- matrix(rnorm(250 * 10), 250, 10)
rownames(mock_embeddings) <- paste0("GO:", sprintf("%07d", 1:250))

# Mock term2name data
mock_term2name <- data.frame(
GOID = paste0("GO:", sprintf("%07d", 1:250)),
TermName = paste0("Term ", 1:250)
)

# Generate the mock PAVER_result using the prepare_data function
mock_PAVER_result <- prepare_data(mock_input, mock_embeddings, mock_term2name)

# Run the generate_themes function with the mock PAVER_result
result <- generate_themes(mock_PAVER_result)

# Run the function and catch the result
p <- PAVER_regulation_plot(PAVER_result)
p <- PAVER_regulation_plot(result)

# Verify the function runs and produces a ggplot object
expect_s3_class(p, "gg")
Expand Down
35 changes: 24 additions & 11 deletions tests/testthat/test-PAVER_theme_plot.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,19 +4,32 @@ library(ggpubr)

test_that("PAVER_theme_plot works correctly", {

#Use vignette example data
input = gsea_example

embeddings = readRDS(url("https://github.com/willgryan/PAVER_embeddings/raw/main/2023-03-06/embeddings_2023-03-06.RDS"))

term2name = readRDS(url("https://github.com/willgryan/PAVER_embeddings/raw/main/2023-03-06/term2name_2023-03-06.RDS"))

PAVER_result = prepare_data(input, embeddings, term2name)

PAVER_result <- generate_themes(PAVER_result, minClusterSize = 40)
#Mock input data
mock_input <- data.frame(
GOID = paste0("GO:", sprintf("%07d", 1:250)),
GroupA = rnorm(250),
GroupB = rnorm(250),
GroupC = rnorm(250)
)

# Mock embeddings data
mock_embeddings <- matrix(rnorm(250 * 10), 250, 10)
rownames(mock_embeddings) <- paste0("GO:", sprintf("%07d", 1:250))

# Mock term2name data
mock_term2name <- data.frame(
GOID = paste0("GO:", sprintf("%07d", 1:250)),
TermName = paste0("Term ", 1:250)
)

# Generate the mock PAVER_result using the prepare_data function
mock_PAVER_result <- prepare_data(mock_input, mock_embeddings, mock_term2name)

# Run the generate_themes function with the mock PAVER_result
result <- generate_themes(mock_PAVER_result)

# Run the function and catch the result
p <- PAVER_theme_plot(PAVER_result)
p <- PAVER_theme_plot(result)

# Verify the function runs and produces a ggplot object
expect_s3_class(p, "gg")
Expand Down
57 changes: 35 additions & 22 deletions tests/testthat/test-generate_themes.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,27 +7,40 @@ library(umap)
library(dynamicTreeCut)
library(randomcoloR)

test_that("generate_themes works correctly", {

#Use vignette example data
input = gsea_example

embeddings = readRDS(url("https://github.com/willgryan/PAVER_embeddings/raw/main/2023-03-06/embeddings_2023-03-06.RDS"))

term2name = readRDS(url("https://github.com/willgryan/PAVER_embeddings/raw/main/2023-03-06/term2name_2023-03-06.RDS"))

PAVER_result = prepare_data(input, embeddings, term2name)

# Test generate_themes function
result <- generate_themes(PAVER_result, minClusterSize = 40)

# Verify output structure
expect_type(result, "list")
expect_named(result, c("prepared_data", "embedding_mat", "umap", "goterms_df", "clustering", "avg_cluster_embeddings", "mds", "colors"))
expect_s3_class(result$clustering, "tbl_df")
expect_s3_class(result$avg_cluster_embeddings, "tbl_df")
expect_s3_class(result$mds, "smacof")
expect_type(result$colors, "character")

# Mock unit test for the generate_themes function
test_that("generate_themes function works with mock PAVER_result", {

#Mock input data
mock_input <- data.frame(
GOID = paste0("GO:", sprintf("%07d", 1:250)),
GroupA = rnorm(250),
GroupB = rnorm(250),
GroupC = rnorm(250)
)

# Mock embeddings data
mock_embeddings <- matrix(rnorm(250 * 10), 250, 10)
rownames(mock_embeddings) <- paste0("GO:", sprintf("%07d", 1:250))

# Mock term2name data
mock_term2name <- data.frame(
GOID = paste0("GO:", sprintf("%07d", 1:250)),
TermName = paste0("Term ", 1:250)
)

# Generate the mock PAVER_result using the prepare_data function
mock_PAVER_result <- prepare_data(mock_input, mock_embeddings, mock_term2name)

# Run the generate_themes function with the mock PAVER_result
result <- generate_themes(mock_PAVER_result)

# Test that result is a list
expect_true(is.list(result))

# Test that the result contains expected elements
expect_true("clustering" %in% names(result))
expect_true("avg_cluster_embeddings" %in% names(result))
expect_true("mds" %in% names(result))
expect_true("colors" %in% names(result))
})

Loading

0 comments on commit 32f0d56

Please sign in to comment.