Add details on controlling pathway clustering to the vignette

Quietted some function calls
willgryan · Feb 2, 2024 · fcffec2 · fcffec2
1 parent 7a284a6
commit fcffec2
Show file tree

Hide file tree

Showing 3 changed files with 15 additions and 6 deletions.
diff --git a/R/generate_themes.R b/R/generate_themes.R
@@ -24,11 +24,11 @@ generate_themes <-
     clust = stats::hclust(D_sim, method = hclust_method)
 
     #Dynamic tree cut to generate clusters
-    clustering = dynamicTreeCut::cutreeDynamic(clust, distM = as.matrix(D_sim), ...) %>%
+    clustering = dynamicTreeCut::cutreeDynamic(clust, distM = as.matrix(D_sim), verbose = 0, ...) %>%
       purrr::set_names(clust$labels) %>%
       tibble::enframe(name = "UniqueID", value = "Cluster") %>%
       dplyr::mutate(Cluster = as.factor(.data$Cluster)) %>%
-      dplyr::inner_join(PAVER_result$embedding_mat)
+      dplyr::inner_join(PAVER_result$embedding_mat, by = "UniqueID")
 
     #Average the embeddings within each cluster
     avg_cluster_embeddings = clustering %>%

diff --git a/R/prepare_data.R b/R/prepare_data.R
@@ -31,7 +31,7 @@ prepare_data <- function(input, embeddings, term2name) {
   #Generate an embedding table keyed by unique pathway IDs
   embedding_mat = embeddings[prepared_data$GOID,] %>%
     magrittr::set_rownames(prepared_data$UniqueID) %>%
-    tibble::as_tibble(rownames = "UniqueID", .name_repair = "universal")
+    tibble::as_tibble(rownames = "UniqueID", .name_repair = "universal_quiet")
 
   #Compute the UMAP of the embedding matrix
   custom.config = umap::umap.defaults

diff --git a/vignettes/PAVER.Rmd b/vignettes/PAVER.Rmd
@@ -35,14 +35,23 @@ embeddings = readRDS(url("https://github.com/willgryan/PAVER_embeddings/raw/main
 term2name = readRDS(url("https://github.com/willgryan/PAVER_embeddings/raw/main/2023-03-06/term2name_2023-03-06.RDS"))
 
 PAVER_result = prepare_data(input, embeddings, term2name)
-
 ```
 
 # Identifying and Naming Pathways Clusters
 
-After preparing your data, PAVER can generate a set of pathway clusters and identify the most representative pathway (theme) for each cluster. The following code chunk demonstrates how to generate pathway clusters using the example data provided in the PAVER package. To control the minimum number of pathways in a cluster, we pass the `minClusterSize` argument to (dynamicTreeCut)[https://cran.r-project.org/package=dynamicTreeCut].
+After preparing your data, PAVER can generate a set of pathway clusters and identify the most representative pathway (theme) for each cluster. The following code chunk demonstrates how to generate pathway clusters using the example data provided in the PAVER package. To constrain the pathway clustering, we pass the following arguments to (dynamicTreeCut)[https://cran.r-project.org/package=dynamicTreeCut]. Increasing `minClusterSize` will result in fewer clusters, while increasing `maxCoreScatter` will result in more clusters. 
+<!-- https://stackoverflow.com/questions/19734381/cutting-dendrogram-into-n-trees-with-minimum-cluster-size-in-r -->
 ```{r}
-PAVER_result = generate_themes(PAVER_result, minClusterSize = 40)
+minClusterSize = 5
+maxCoreScatter = 0.33
+minGap = (1 - maxCoreScatter) * 3 / 4
+PAVER_result = generate_themes(
+  PAVER_result,
+  maxCoreScatter = maxCoreScatter,
+  minGap = minGap,
+  minClusterSize = minClusterSize
+)
+# 
 ```
 
 # Visualization