-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flowjo_to_gatingset memory error #111
Comments
how about load a subset of this dataset to see if it went through, i.e. gs <- flowjo_to_gatingset(ws, name = 1, subset = 1:2) Also try to turn on the detailed logging (i.e. |
Hi, @mikejiang I tried subsetting each file, but all result in the same error. Also, am I using the log function correctly? I just get the same error as without using it.
|
It is strange that your log is not displayed. I wonder if it even hits the logic of c parser. Can you paste the |
@mikejiang |
Sorry, I forgot to paste the traceback. Here it is:
|
I can't reproduce your error. It seems to parse ok for me (on both bioc release and devel branches) library(CytoML)
wsfile <- "~/Downloads/fcs and wsp/15-Aug-2020.wsp"
ws <- open_flowjo_xml(wsfile)
gs <- flowjo_to_gatingset(ws, name = 1)
library(flowWorkspace)
gh_pop_compare_stats(gs[[1]])
openCyto.freq xml.freq openCyto.count xml.count node
1: 1.00000000 1.00000000 1541997 1541997 root
2: 0.32648767 0.32648767 503443 503443 /Time, Event_length subset
3: 0.08735448 0.08735448 43978 43978 Time, Event_length subset/Time, Event_length subset But I am on linux, I will give it another try on windows > sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS
Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] flowWorkspace_4.0.6 CytoML_2.0.5 BiocManager_1.30.10
loaded via a namespace (and not attached):
[1] Rcpp_1.0.4.6 plyr_1.8.6 pillar_1.4.4 compiler_4.0.0 cytolib_2.0.3 RColorBrewer_1.1-2
[7] base64enc_0.1-3 tools_4.0.0 zlibbioc_1.34.0 digest_0.6.25 jsonlite_1.6.1 gtable_0.3.0
[13] lifecycle_0.2.0 tibble_3.0.1 lattice_0.20-41 png_0.1-7 pkgconfig_2.0.3 rlang_0.4.6
[19] graph_1.66.0 rstudioapi_0.11 Rgraphviz_2.32.0 yaml_2.2.1 parallel_4.0.0 hexbin_1.28.1
[25] gridExtra_2.3 xml2_1.3.2 stringr_1.4.0 dplyr_1.0.0 generics_0.0.2 vctrs_0.3.1
[31] stats4_4.0.0 grid_4.0.0 tidyselect_1.1.0 glue_1.4.1 data.table_1.12.8 Biobase_2.48.0
[37] R6_2.4.1 jpeg_0.1-8.1 XML_3.99-0.3 RBGL_1.64.0 latticeExtra_0.6-29 ggplot2_3.3.1
[43] RProtoBufLib_2.0.0 purrr_0.3.4 magrittr_1.5 scales_1.1.1 ellipsis_0.3.1 matrixStats_0.56.0
[49] BiocGenerics_0.34.0 colorspace_1.4-1 flowCore_2.0.1 ncdfFlow_2.34.0 stringi_1.4.6 munsell_0.5.0
[55] RcppParallel_5.0.1 crayon_1.3.4 ggcyto_1.16.0 |
I've verified it worked fine on windows as well. |
Hi, I've reinstalled R, cytolib, flowCore, flowWorkspace and CytoML from github and I still get the same error.
edit: wrong sessionInfo text |
If I read the file in flowCore, subset it, and write it as a new fcs.
|
I don't know if it helps, but doing what I described above (writing fcs files of different sizes) I have been able to figure out the exact number of events leading to the error: If I reduce the number of columns, going from 56 to 55, the number of rows needed to get the error again increases to 1220162. Looking at the number of elements in the matrix: 1220162 * 55 = 67108910 does not work Seems the threshold is somewhere between 67108855 and 67108888 elements. Some more tests: My guess is that you might be able to reproduce this error if you make some large matrices. |
I don't think error is from h5 library(CytoML)
wsfile <- "../Downloads/tt/15-Aug-2020.wsp"
ws <- open_flowjo_xml(wsfile)
gs <- flowjo_to_gatingset(ws, name = 1)
h5 <- cf_get_h5_file_path(get_cytoframe_from_cs(gs_cyto_data(gs),1))
utils:::format.object_size(file.size(h5), "auto")
[1] "329.5 Mb" |
ok. Somehow I was using 32bit R on windows. After switching to 64bit, I am able to reproduce your error now. I will try to get to the bottom of it. |
turned out to be the integer overflow issue. On linux , |
The fix is in the bioconductor development branch (probably will appear tomorrow). Or you can install it from source through github repo (if you know how to build the package from source on windows). |
Hi Mike, |
I am trying to read gates from FlowJo workspace, however, I am encountering a memory error when calling
flowjo_to_gatingset ()
on my flowjo wsp.When run in reprex (above) the error message is different than when run in console. In console the error is:
error: arma::memory::acquire(): out of memory Error in (function (ws, group_id, subset, execute, path, cytoset, h5_dir, : std::bad_alloc
The dataset I am trying to read consists of 21 fcs-files that take up 1.99 GB of storage space in total.
My computer has 128 GB of RAM.
I have tested two other datasets, one being 192 MB and the other being 6.98 MB. These are read successfully.
The text was updated successfully, but these errors were encountered: