Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Error during extract and reformat enrichment results when using registration_wrapper #72

Open
boyiguo1 opened this issue Jan 2, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@boyiguo1
Copy link

boyiguo1 commented Jan 2, 2024

Describe the bug

Thanks for creating and maintaining the awesome spatialLIBD package. Recently, I was trying to conduct spatial registration by running registration_wrapper function on a SpatialExperiment object. I encountered the error message Error in combn(colnames(registration_model)[regis_cols], 2) : n < m (please find the reproducible example below), which I have trouble interpreting the error message and fix it.

The spatialLIBD is at v1.14.1 installed via Bioconductor version '3.18'

Provide a minimally reproducible example (reprex)

# Load packages -----------------------------------------------------------
library(here)
#> here() starts at /private/var/folders/ff/m8yvwsq51sjg43xscd3x1tmr0000gn/T/RtmpWA9JNL/reprex-774f49f557da-stony-moth
library(SpatialExperiment)
#> Loading required package: SingleCellExperiment
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: 'MatrixGenerics'
#> The following objects are masked from 'package:matrixStats':
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> Loading required package: GenomicRanges
#> Loading required package: stats4
#> Loading required package: BiocGenerics
#> 
#> Attaching package: 'BiocGenerics'
#> The following objects are masked from 'package:stats':
#> 
#>     IQR, mad, sd, var, xtabs
#> The following objects are masked from 'package:base':
#> 
#>     anyDuplicated, aperm, append, as.data.frame, basename, cbind,
#>     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,
#>     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,
#>     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,
#>     Position, rank, rbind, Reduce, rownames, sapply, setdiff, sort,
#>     table, tapply, union, unique, unsplit, which.max, which.min
#> Loading required package: S4Vectors
#> 
#> Attaching package: 'S4Vectors'
#> The following object is masked from 'package:utils':
#> 
#>     findMatches
#> The following objects are masked from 'package:base':
#> 
#>     expand.grid, I, unname
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> Loading required package: Biobase
#> Welcome to Bioconductor
#> 
#>     Vignettes contain introductory material; view with
#>     'browseVignettes()'. To cite Bioconductor, see
#>     'citation("Biobase")', and for packages 'citation("pkgname")'.
#> 
#> Attaching package: 'Biobase'
#> The following object is masked from 'package:MatrixGenerics':
#> 
#>     rowMedians
#> The following objects are masked from 'package:matrixStats':
#> 
#>     anyMissing, rowMedians
library(spatialLIBD)
library(SingleR)
# library(tidyverse)


# Load spatialDLPFC data ---------------------------------------------------

spe <- fetch_data(type = "spatialDLPFC_Visium")
#> 2024-01-02 13:33:34.255759 loading file /Users/bguo6/Library/Caches/org.R-project.R/R/BiocFileCache/5449288a26e9_spe_filtered_final_with_clusters_and_deconvolution_results.rds%3Fdl%3D1

set.seed(1)
scr_idx <- sample(unique(spe$sample_id), 15)
tar_idx <- setdiff(unique(spe$sample_id), scr_idx)

src_spe <- spe[,spe$sample_id %in% scr_idx]
tar_spe <- spe[,spe$sample_id %in% tar_idx]

# Spatial Registration ----------------------------------------------------

## Get enrichment statistics from source ---------------------------------
sce_modeling_results <- registration_wrapper(
  sce = src_spe,
  # sce = src_spe,
  var_registration = "BayesSpace_harmony_09",
  var_sample_id = "sample_id",
  gene_ensembl = "gene_id",
  gene_name = "gene_name"
)
#> 2024-01-02 13:34:59.526861 make pseudobulk object
#> 2024-01-02 13:35:07.619735 dropping 0 pseudo-bulked samples that are below 'min_ncells'.
#> 2024-01-02 13:35:07.650823 drop lowly expressed genes
#> 2024-01-02 13:35:07.781492 normalize expression
#> 2024-01-02 13:35:08.545616 create model matrix
#> 2024-01-02 13:35:08.696194 run duplicateCorrelation()
#> 2024-01-02 13:35:26.808509 The estimated correlation is: 0.960849320333061
#> 2024-01-02 13:35:26.811491 computing enrichment statistics
#> 2024-01-02 13:35:28.62727 extract and reformat enrichment results
#> Error in combn(colnames(registration_model)[regis_cols], 2): n < m

Created on 2024-01-02 with reprex v2.0.2

Expected behavior

I would expect that registration_wrapper would run successfully.

R Session Information

Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.3.2 (2023-10-31)
#>  os       macOS Sonoma 14.2.1
#>  system   aarch64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/New_York
#>  date     2024-01-02
#>  pandoc   3.1.1 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package                * version   date (UTC) lib source
#>  abind                    1.4-5     2016-07-21 [1] CRAN (R 4.3.0)
#>  AnnotationDbi            1.64.1    2023-11-02 [1] Bioconductor
#>  AnnotationHub            3.10.0    2023-10-26 [1] Bioconductor
#>  attempt                  0.3.1     2020-05-03 [1] CRAN (R 4.3.0)
#>  beachmat                 2.18.0    2023-10-24 [1] Bioconductor
#>  beeswarm                 0.4.0     2021-06-01 [1] CRAN (R 4.3.0)
#>  benchmarkme              1.0.8     2022-06-12 [1] CRAN (R 4.3.0)
#>  benchmarkmeData          1.0.4     2020-04-23 [1] CRAN (R 4.3.0)
#>  Biobase                * 2.62.0    2023-10-26 [1] Bioconductor
#>  BiocFileCache            2.10.1    2023-10-26 [1] Bioconductor
#>  BiocGenerics           * 0.48.1    2023-11-02 [1] Bioconductor
#>  BiocIO                   1.12.0    2023-10-26 [1] Bioconductor
#>  BiocManager              1.30.22   2023-08-08 [1] CRAN (R 4.3.0)
#>  BiocNeighbors            1.20.1    2023-12-18 [1] Bioconductor 3.18 (R 4.3.0)
#>  BiocParallel             1.36.0    2023-10-26 [1] Bioconductor
#>  BiocSingular             1.18.0    2023-10-24 [1] Bioconductor
#>  BiocVersion              3.18.1    2023-11-18 [1] Bioconductor 3.18 (R 4.3.2)
#>  Biostrings               2.70.1    2023-10-26 [1] Bioconductor
#>  bit                      4.0.5     2022-11-15 [1] CRAN (R 4.3.0)
#>  bit64                    4.0.5     2020-08-30 [1] CRAN (R 4.3.0)
#>  bitops                   1.0-7     2021-04-24 [1] CRAN (R 4.3.0)
#>  blob                     1.2.4     2023-03-17 [1] CRAN (R 4.3.0)
#>  bslib                    0.6.1     2023-11-28 [1] CRAN (R 4.3.1)
#>  cachem                   1.0.8     2023-05-01 [1] CRAN (R 4.3.0)
#>  cli                      3.6.2     2023-12-11 [1] CRAN (R 4.3.1)
#>  codetools                0.2-19    2023-02-01 [1] CRAN (R 4.3.2)
#>  colorspace               2.1-0     2023-01-23 [1] CRAN (R 4.3.0)
#>  config                   0.3.2     2023-08-30 [1] CRAN (R 4.3.0)
#>  cowplot                  1.1.2     2023-12-15 [1] CRAN (R 4.3.1)
#>  crayon                   1.5.2     2022-09-29 [1] CRAN (R 4.3.0)
#>  curl                     5.2.0     2023-12-08 [1] CRAN (R 4.3.1)
#>  data.table               1.14.10   2023-12-08 [1] CRAN (R 4.3.1)
#>  DBI                      1.2.0     2023-12-21 [1] CRAN (R 4.3.1)
#>  dbplyr                   2.4.0     2023-10-26 [1] CRAN (R 4.3.1)
#>  DelayedArray             0.28.0    2023-10-24 [1] Bioconductor
#>  DelayedMatrixStats       1.24.0    2023-10-24 [1] Bioconductor
#>  digest                   0.6.33    2023-07-07 [1] CRAN (R 4.3.0)
#>  doParallel               1.0.17    2022-02-07 [1] CRAN (R 4.3.0)
#>  dotCall64                1.1-1     2023-11-28 [1] CRAN (R 4.3.1)
#>  dplyr                    1.1.4     2023-11-17 [1] CRAN (R 4.3.1)
#>  DT                       0.31      2023-12-09 [1] CRAN (R 4.3.1)
#>  edgeR                    4.0.3     2023-12-09 [1] Bioconductor 3.18 (R 4.3.2)
#>  ellipsis                 0.3.2     2021-04-29 [1] CRAN (R 4.3.0)
#>  evaluate                 0.23      2023-11-01 [1] CRAN (R 4.3.1)
#>  ExperimentHub            2.10.0    2023-10-26 [1] Bioconductor
#>  fansi                    1.0.6     2023-12-08 [1] CRAN (R 4.3.1)
#>  fastmap                  1.1.1     2023-02-24 [1] CRAN (R 4.3.0)
#>  fields                   15.2      2023-08-17 [1] CRAN (R 4.3.0)
#>  filelock                 1.0.3     2023-12-11 [1] CRAN (R 4.3.1)
#>  foreach                  1.5.2     2022-02-02 [1] CRAN (R 4.3.0)
#>  fs                       1.6.3     2023-07-20 [1] CRAN (R 4.3.0)
#>  generics                 0.1.3     2022-07-05 [1] CRAN (R 4.3.0)
#>  GenomeInfoDb           * 1.38.1    2023-11-11 [1] Bioconductor
#>  GenomeInfoDbData         1.2.11    2023-11-08 [1] Bioconductor
#>  GenomicAlignments        1.38.0    2023-10-26 [1] Bioconductor
#>  GenomicRanges          * 1.54.1    2023-10-30 [1] Bioconductor
#>  ggbeeswarm               0.7.2     2023-04-29 [1] CRAN (R 4.3.0)
#>  ggplot2                  3.4.4     2023-10-12 [1] CRAN (R 4.3.1)
#>  ggrepel                  0.9.4     2023-10-13 [1] CRAN (R 4.3.1)
#>  glue                     1.6.2     2022-02-24 [1] CRAN (R 4.3.0)
#>  golem                    0.4.1     2023-06-05 [1] CRAN (R 4.3.0)
#>  gridExtra                2.3       2017-09-09 [1] CRAN (R 4.3.0)
#>  gtable                   0.3.4     2023-08-21 [1] CRAN (R 4.3.0)
#>  here                   * 1.0.1     2020-12-13 [1] CRAN (R 4.3.0)
#>  htmltools                0.5.7     2023-11-03 [1] CRAN (R 4.3.1)
#>  htmlwidgets              1.6.4     2023-12-06 [1] CRAN (R 4.3.1)
#>  httpuv                   1.6.13    2023-12-06 [1] CRAN (R 4.3.1)
#>  httr                     1.4.7     2023-08-15 [1] CRAN (R 4.3.0)
#>  interactiveDisplayBase   1.40.0    2023-10-26 [1] Bioconductor
#>  IRanges                * 2.36.0    2023-10-26 [1] Bioconductor
#>  irlba                    2.3.5.1   2022-10-03 [1] CRAN (R 4.3.0)
#>  iterators                1.0.14    2022-02-05 [1] CRAN (R 4.3.0)
#>  jquerylib                0.1.4     2021-04-26 [1] CRAN (R 4.3.0)
#>  jsonlite                 1.8.8     2023-12-04 [1] CRAN (R 4.3.1)
#>  KEGGREST                 1.42.0    2023-10-26 [1] Bioconductor
#>  knitr                    1.45      2023-10-30 [1] CRAN (R 4.3.1)
#>  later                    1.3.2     2023-12-06 [1] CRAN (R 4.3.1)
#>  lattice                  0.21-9    2023-10-01 [1] CRAN (R 4.3.2)
#>  lazyeval                 0.2.2     2019-03-15 [1] CRAN (R 4.3.0)
#>  lifecycle                1.0.4     2023-11-07 [1] CRAN (R 4.3.0)
#>  limma                    3.58.1    2023-11-02 [1] Bioconductor
#>  locfit                   1.5-9.8   2023-06-11 [1] CRAN (R 4.3.0)
#>  magick                   2.8.2     2023-12-20 [1] CRAN (R 4.3.1)
#>  magrittr                 2.0.3     2022-03-30 [1] CRAN (R 4.3.0)
#>  maps                     3.4.2     2023-12-15 [1] CRAN (R 4.3.1)
#>  Matrix                   1.6-1.1   2023-09-18 [1] CRAN (R 4.3.2)
#>  MatrixGenerics         * 1.14.0    2023-10-26 [1] Bioconductor
#>  matrixStats            * 1.2.0     2023-12-11 [1] CRAN (R 4.3.1)
#>  memoise                  2.0.1     2021-11-26 [1] CRAN (R 4.3.0)
#>  mime                     0.12      2021-09-28 [1] CRAN (R 4.3.0)
#>  munsell                  0.5.0     2018-06-12 [1] CRAN (R 4.3.0)
#>  paletteer                1.5.0     2022-10-19 [1] CRAN (R 4.3.0)
#>  pillar                   1.9.0     2023-03-22 [1] CRAN (R 4.3.0)
#>  pkgconfig                2.0.3     2019-09-22 [1] CRAN (R 4.3.0)
#>  plotly                   4.10.3    2023-10-21 [1] CRAN (R 4.3.1)
#>  png                      0.1-8     2022-11-29 [1] CRAN (R 4.3.0)
#>  promises                 1.2.1     2023-08-10 [1] CRAN (R 4.3.0)
#>  purrr                    1.0.2     2023-08-10 [1] CRAN (R 4.3.0)
#>  R.cache                  0.16.0    2022-07-21 [1] CRAN (R 4.3.0)
#>  R.methodsS3              1.8.2     2022-06-13 [1] CRAN (R 4.3.0)
#>  R.oo                     1.25.0    2022-06-12 [1] CRAN (R 4.3.0)
#>  R.utils                  2.12.3    2023-11-18 [1] CRAN (R 4.3.1)
#>  R6                       2.5.1     2021-08-19 [1] CRAN (R 4.3.0)
#>  rappdirs                 0.3.3     2021-01-31 [1] CRAN (R 4.3.0)
#>  RColorBrewer             1.1-3     2022-04-03 [1] CRAN (R 4.3.0)
#>  Rcpp                     1.0.11    2023-07-06 [1] CRAN (R 4.3.0)
#>  RCurl                    1.98-1.13 2023-11-02 [1] CRAN (R 4.3.1)
#>  rematch2                 2.1.2     2020-05-01 [1] CRAN (R 4.3.0)
#>  reprex                   2.0.2     2022-08-17 [1] CRAN (R 4.3.0)
#>  restfulr                 0.0.15    2022-06-16 [1] CRAN (R 4.3.0)
#>  rjson                    0.2.21    2022-01-09 [1] CRAN (R 4.3.0)
#>  rlang                    1.1.2     2023-11-04 [1] CRAN (R 4.3.0)
#>  rmarkdown                2.25      2023-09-18 [1] CRAN (R 4.3.1)
#>  rprojroot                2.0.4     2023-11-05 [1] CRAN (R 4.3.0)
#>  Rsamtools                2.18.0    2023-10-26 [1] Bioconductor
#>  RSQLite                  2.3.4     2023-12-08 [1] CRAN (R 4.3.1)
#>  rstudioapi               0.15.0    2023-07-07 [1] CRAN (R 4.3.0)
#>  rsvd                     1.0.5     2021-04-16 [1] CRAN (R 4.3.0)
#>  rtracklayer              1.62.0    2023-10-26 [1] Bioconductor
#>  S4Arrays                 1.2.0     2023-10-26 [1] Bioconductor
#>  S4Vectors              * 0.40.2    2023-11-25 [1] Bioconductor 3.18 (R 4.3.2)
#>  sass                     0.4.8     2023-12-06 [1] CRAN (R 4.3.1)
#>  ScaledMatrix             1.10.0    2023-10-24 [1] Bioconductor
#>  scales                   1.3.0     2023-11-28 [1] CRAN (R 4.3.1)
#>  scater                   1.30.1    2023-11-16 [1] Bioconductor
#>  scuttle                  1.12.0    2023-10-24 [1] Bioconductor
#>  sessioninfo              1.2.2     2021-12-06 [1] CRAN (R 4.3.0)
#>  shiny                    1.8.0     2023-11-17 [1] CRAN (R 4.3.1)
#>  shinyWidgets             0.8.0     2023-08-30 [1] CRAN (R 4.3.0)
#>  SingleCellExperiment   * 1.24.0    2023-10-24 [1] Bioconductor
#>  SingleR                * 2.4.0     2023-10-24 [1] Bioconductor
#>  spam                     2.10-0    2023-10-23 [1] CRAN (R 4.3.1)
#>  SparseArray              1.2.2     2023-11-08 [1] Bioconductor
#>  sparseMatrixStats        1.14.0    2023-10-26 [1] Bioconductor
#>  SpatialExperiment      * 1.12.0    2023-10-26 [1] Bioconductor
#>  spatialLIBD            * 1.14.1    2023-11-30 [1] Bioconductor 3.18 (R 4.3.0)
#>  statmod                  1.5.0     2023-01-06 [1] CRAN (R 4.3.0)
#>  stringi                  1.8.3     2023-12-11 [1] CRAN (R 4.3.1)
#>  stringr                  1.5.1     2023-11-14 [1] CRAN (R 4.3.1)
#>  styler                   1.10.2    2023-08-29 [1] CRAN (R 4.3.0)
#>  SummarizedExperiment   * 1.32.0    2023-10-24 [1] Bioconductor
#>  tibble                   3.2.1     2023-03-20 [1] CRAN (R 4.3.0)
#>  tidyr                    1.3.0     2023-01-24 [1] CRAN (R 4.3.0)
#>  tidyselect               1.2.0     2022-10-10 [1] CRAN (R 4.3.0)
#>  utf8                     1.2.4     2023-10-22 [1] CRAN (R 4.3.1)
#>  vctrs                    0.6.5     2023-12-01 [1] CRAN (R 4.3.1)
#>  vipor                    0.4.7     2023-12-18 [1] CRAN (R 4.3.1)
#>  viridis                  0.6.4     2023-07-22 [1] CRAN (R 4.3.0)
#>  viridisLite              0.4.2     2023-05-02 [1] CRAN (R 4.3.0)
#>  withr                    2.5.2     2023-10-30 [1] CRAN (R 4.3.1)
#>  xfun                     0.41      2023-11-01 [1] CRAN (R 4.3.1)
#>  XML                      3.99-0.16 2023-11-29 [1] CRAN (R 4.3.1)
#>  xtable                   1.8-4     2019-04-21 [1] CRAN (R 4.3.0)
#>  XVector                  0.42.0    2023-10-26 [1] Bioconductor
#>  yaml                     2.3.8     2023-12-11 [1] CRAN (R 4.3.1)
#>  zlibbioc                 1.48.0    2023-10-26 [1] Bioconductor
#> 
#>  [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────
@boyiguo1 boyiguo1 added the bug Something isn't working label Jan 2, 2024
@lahuuki
Copy link
Member

lahuuki commented Jan 10, 2024

Hi Boyi,
This error was the result of src_spe$BayesSpace_harmony_09 being an integer, the root of the error is here where the pairwise comparison is set up. However this probably also cause unexpected behaviour with registration_stats_anova and registration_stats_enrichment as well since the model is not categorical but continuous.

This can be solved by replacing the numeric categories with syntactically valid character instead like this: src_spe$BayesSpace_harmony_09 <- paste0("k09d",src_spe$BayesSpace_harmony_09)

Non-syntatic characters names also cause issues like:
Error in limma::makeContrasts(contrasts = z, levels = registration_model) : The levels must by syntactically valid names in R, see help(make.names).

We should include a check that var_registration is categorical and syntactically valid.

Thanks for pointing out this Bug!

@boyiguo1
Copy link
Author

I think @lahuuki's explanation is perfect. I just want to add another corner case if other's not paying attention with categorical and syntactically valid.

The variable has to be a string or factor whose labels follows the syntactically valid.

The error will remain if someone uses the following line of code

> spe$BayesSpace_harmony_09 <- factor(spe$BayesSpace_harmony_09)
> levels(spe$BayesSpace_harmony_09)
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9"

There's no error with the following code

> spe$BayesSpace_harmony_09 <- factor(
+   spe$BayesSpace_harmony_09,
+   labels = paste0("k09d", levels(spe$BayesSpace_harmony_09))
+ )
> levels(spe$BayesSpace_harmony_09 )
[1] "k09d1" "k09d2" "k09d3" "k09d4" "k09d5" "k09d6" "k09d7"
[8] "k09d8" "k09d9"

@lcolladotor
Copy link
Member

@lahuuki can you use Boyi's two reproducible examples to build a unit tests for these functions?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants