Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flowjo_to_gatingset memory error #111

Open
SansMorel opened this issue Aug 13, 2020 · 17 comments
Open

flowjo_to_gatingset memory error #111

SansMorel opened this issue Aug 13, 2020 · 17 comments

Comments

@SansMorel
Copy link

I am trying to read gates from FlowJo workspace, however, I am encountering a memory error when calling flowjo_to_gatingset () on my flowjo wsp.

library(flowWorkspace)
#> As part of improvements to flowWorkspace, some behavior of
#> GatingSet objects has changed. For details, please read the section
#> titled "The cytoframe and cytoset classes" in the package vignette:
#> 
#>   vignette("flowWorkspace-Introduction", "flowWorkspace")
wsfile <- list.files(pattern="wsp", full = T)
library(CytoML)
ws <- open_flowjo_xml(wsfile)
fj_ws_get_samples(ws)
#>    sampleID                     name   count pop.counts
#> 1        16 20191217_C57KC003_NC.fcs 1541997          9
#> 2        17 20191217_C57KC003_NC.fcs  290848         11
#> 3        18 20191217_C57KC003_NC.fcs  353657         11
#> 4        19 20191217_C57KC003_NC.fcs  142301         11
#> 5        20 20191217_C57KC003_NC.fcs  556701         11
#> 6        21 20191217_C57KC003_NC.fcs  225741         11
#> 7         1 20191217_C57KC003_NC.fcs  501074         11
#> 8         2 20191217_C57KC003_NC.fcs  453926          8
#> 9         3 20191217_C57KC003_NC.fcs   29503          8
#> 10        4 20191217_C57KC003_NC.fcs  352020          8
#> 11        5 20191217_C57KC003_NC.fcs  223846         11
#> 12        6 20191217_C57KC003_NC.fcs  426527         11
#> 13        7 20191217_C57KC003_NC.fcs  471965         11
#> 14        8 20191217_C57KC003_NC.fcs  387698         11
#> 15        9 20191217_C57KC003_NC.fcs  108852         11
#> 16       10 20191217_C57KC003_NC.fcs  445379         11
#> 17       11 20191217_C57KC003_NC.fcs  693633         11
#> 18       12 20191217_C57KC003_NC.fcs  452876         11
#> 19       13 20191217_C57KC003_NC.fcs  565591         11
#> 20       14 20191217_C57KC003_NC.fcs  752113         11
#> 21       15 20191217_C57KC003_NC.fcs  583126         11
fj_ws_get_sample_groups(ws)
#>      groupName groupID sampleID
#> 1  All Samples       0       16
#> 2  All Samples       0       17
#> 3  All Samples       0       18
#> 4  All Samples       0       19
#> 5  All Samples       0       20
#> 6  All Samples       0       21
#> 7  All Samples       0        1
#> 8  All Samples       0        2
#> 9  All Samples       0        3
#> 10 All Samples       0        4
#> 11 All Samples       0        5
#> 12 All Samples       0        6
#> 13 All Samples       0        7
#> 14 All Samples       0        8
#> 15 All Samples       0        9
#> 16 All Samples       0       10
#> 17 All Samples       0       11
#> 18 All Samples       0       12
#> 19 All Samples       0       13
#> 20 All Samples       0       14
#> 21 All Samples       0       15
gs <- flowjo_to_gatingset(ws, name = 1)
#> Error in (function (ws, group_id, subset, execute, path, cytoset, h5_dir, : std::bad_alloc
sessionInfo()
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18363)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Norwegian Bokmål_Norway.1252 
#> [2] LC_CTYPE=Norwegian Bokmål_Norway.1252   
#> [3] LC_MONETARY=Norwegian Bokmål_Norway.1252
#> [4] LC_NUMERIC=C                            
#> [5] LC_TIME=Norwegian Bokmål_Norway.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] CytoML_2.0.5        flowWorkspace_4.0.6
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.1.0    xfun_0.16           purrr_0.3.4        
#>  [4] lattice_0.20-41     colorspace_1.4-1    vctrs_0.3.2        
#>  [7] generics_0.0.2      htmltools_0.5.0     stats4_4.0.2       
#> [10] ncdfFlow_2.34.0     yaml_2.2.1          base64enc_0.1-3    
#> [13] flowCore_2.0.1      XML_3.99-0.5        RBGL_1.64.0        
#> [16] rlang_0.4.7         hexbin_1.28.1       pillar_1.4.6       
#> [19] glue_1.4.1          Rgraphviz_2.32.0    BiocGenerics_0.34.0
#> [22] RColorBrewer_1.1-2  plyr_1.8.6          matrixStats_0.56.0 
#> [25] jpeg_0.1-8.1        lifecycle_0.2.0     stringr_1.4.0      
#> [28] zlibbioc_1.34.0     RProtoBufLib_2.0.0  munsell_0.5.0      
#> [31] gtable_0.3.0        cytolib_2.0.3       evaluate_0.14      
#> [34] latticeExtra_0.6-29 Biobase_2.48.0      knitr_1.29         
#> [37] parallel_4.0.2      highr_0.8           Rcpp_1.0.5         
#> [40] scales_1.1.1        jsonlite_1.7.0      RcppParallel_5.0.2 
#> [43] graph_1.66.0        gridExtra_2.3       ggplot2_3.3.2      
#> [46] png_0.1-7           digest_0.6.25       stringi_1.4.6      
#> [49] dplyr_1.0.1         grid_4.0.2          tools_4.0.2        
#> [52] magrittr_1.5        tibble_3.0.3        crayon_1.3.4       
#> [55] pkgconfig_2.0.3     ellipsis_0.3.1      xml2_1.3.2         
#> [58] data.table_1.13.0   rmarkdown_2.3       R6_2.4.1           
#> [61] ggcyto_1.16.0       compiler_4.0.2

When run in reprex (above) the error message is different than when run in console. In console the error is:
error: arma::memory::acquire(): out of memory Error in (function (ws, group_id, subset, execute, path, cytoset, h5_dir, : std::bad_alloc

The dataset I am trying to read consists of 21 fcs-files that take up 1.99 GB of storage space in total.
My computer has 128 GB of RAM.
I have tested two other datasets, one being 192 MB and the other being 6.98 MB. These are read successfully.

@mikejiang
Copy link
Member

how about load a subset of this dataset to see if it went through, i.e.

gs <- flowjo_to_gatingset(ws, name = 1, subset = 1:2)

Also try to turn on the detailed logging (i.e. set_log_level("Gate")) and paste the messages that are immediate before the error

@SansMorel
Copy link
Author

Hi, @mikejiang

I tried subsetting each file, but all result in the same error. Also, am I using the log function correctly? I just get the same error as without using it.

library(flowWorkspace)
#> As part of improvements to flowWorkspace, some behavior of
#> GatingSet objects has changed. For details, please read the section
#> titled "The cytoframe and cytoset classes" in the package vignette:
#> 
#>   vignette("flowWorkspace-Introduction", "flowWorkspace")
wsfile <- list.files(pattern="wsp", full = T)
library(CytoML)
ws <- open_flowjo_xml(wsfile)
fj_ws_get_samples(ws)
#>    sampleID                     name   count pop.counts
#> 1        16 20191217_C57KC003_NC.fcs 1541997          9
#> 2        17 20191217_C57KC003_NC.fcs  290848         11
#> 3        18 20191217_C57KC003_NC.fcs  353657         11
#> 4        19 20191217_C57KC003_NC.fcs  142301         11
#> 5        20 20191217_C57KC003_NC.fcs  556701         11
#> 6        21 20191217_C57KC003_NC.fcs  225741         11
#> 7         1 20191217_C57KC003_NC.fcs  501074         11
#> 8         2 20191217_C57KC003_NC.fcs  453926          8
#> 9         3 20191217_C57KC003_NC.fcs   29503          8
#> 10        4 20191217_C57KC003_NC.fcs  352020          8
#> 11        5 20191217_C57KC003_NC.fcs  223846         11
#> 12        6 20191217_C57KC003_NC.fcs  426527         11
#> 13        7 20191217_C57KC003_NC.fcs  471965         11
#> 14        8 20191217_C57KC003_NC.fcs  387698         11
#> 15        9 20191217_C57KC003_NC.fcs  108852         11
#> 16       10 20191217_C57KC003_NC.fcs  445379         11
#> 17       11 20191217_C57KC003_NC.fcs  693633         11
#> 18       12 20191217_C57KC003_NC.fcs  452876         11
#> 19       13 20191217_C57KC003_NC.fcs  565591         11
#> 20       14 20191217_C57KC003_NC.fcs  752113         11
#> 21       15 20191217_C57KC003_NC.fcs  583126         11
fj_ws_get_sample_groups(ws)
#>      groupName groupID sampleID
#> 1  All Samples       0       16
#> 2  All Samples       0       17
#> 3  All Samples       0       18
#> 4  All Samples       0       19
#> 5  All Samples       0       20
#> 6  All Samples       0       21
#> 7  All Samples       0        1
#> 8  All Samples       0        2
#> 9  All Samples       0        3
#> 10 All Samples       0        4
#> 11 All Samples       0        5
#> 12 All Samples       0        6
#> 13 All Samples       0        7
#> 14 All Samples       0        8
#> 15 All Samples       0        9
#> 16 All Samples       0       10
#> 17 All Samples       0       11
#> 18 All Samples       0       12
#> 19 All Samples       0       13
#> 20 All Samples       0       14
#> 21 All Samples       0       15
set_log_level("Gate")
#> [1] "Gate"
gs <- flowjo_to_gatingset(ws, name = 1, subset = 1)
#> Error in (function (ws, group_id, subset, execute, path, cytoset, h5_dir, : std::bad_alloc
get_log_level()
#> [1] "Gate"
sessionInfo()
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 18363)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=Norwegian Bokmål_Norway.1252 
#> [2] LC_CTYPE=Norwegian Bokmål_Norway.1252   
#> [3] LC_MONETARY=Norwegian Bokmål_Norway.1252
#> [4] LC_NUMERIC=C                            
#> [5] LC_TIME=Norwegian Bokmål_Norway.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] CytoML_2.0.5        flowWorkspace_4.0.6
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.1.0    xfun_0.16           purrr_0.3.4        
#>  [4] lattice_0.20-41     colorspace_1.4-1    vctrs_0.3.2        
#>  [7] generics_0.0.2      htmltools_0.5.0     stats4_4.0.2       
#> [10] ncdfFlow_2.34.0     yaml_2.2.1          base64enc_0.1-3    
#> [13] flowCore_2.0.1      XML_3.99-0.5        RBGL_1.64.0        
#> [16] rlang_0.4.7         hexbin_1.28.1       pillar_1.4.6       
#> [19] glue_1.4.1          Rgraphviz_2.32.0    BiocGenerics_0.34.0
#> [22] RColorBrewer_1.1-2  plyr_1.8.6          matrixStats_0.56.0 
#> [25] jpeg_0.1-8.1        lifecycle_0.2.0     stringr_1.4.0      
#> [28] zlibbioc_1.34.0     RProtoBufLib_2.0.0  munsell_0.5.0      
#> [31] gtable_0.3.0        cytolib_2.0.3       evaluate_0.14      
#> [34] latticeExtra_0.6-29 Biobase_2.48.0      knitr_1.29         
#> [37] parallel_4.0.2      highr_0.8           Rcpp_1.0.5         
#> [40] scales_1.1.1        jsonlite_1.7.0      RcppParallel_5.0.2 
#> [43] graph_1.66.0        gridExtra_2.3       ggplot2_3.3.2      
#> [46] png_0.1-7           digest_0.6.25       stringi_1.4.6      
#> [49] dplyr_1.0.1         grid_4.0.2          tools_4.0.2        
#> [52] magrittr_1.5        tibble_3.0.3        crayon_1.3.4       
#> [55] pkgconfig_2.0.3     ellipsis_0.3.1      xml2_1.3.2         
#> [58] data.table_1.13.0   rmarkdown_2.3       R6_2.4.1           
#> [61] ggcyto_1.16.0       compiler_4.0.2

@mikejiang
Copy link
Member

It is strange that your log is not displayed. I wonder if it even hits the logic of c parser. Can you paste the traceback() result immediately after the error? Since it fails for single file, would you be able to share the example wsp and fcs file for troubleshooting?([email protected])

@SansMorel
Copy link
Author

@mikejiang
I tried making a new wsp too see if the previous was corrupted. During this process I discovered that it is a specific file causing this issue.
This file has 1541997 rows and 56 columns. I'll send you a link to the file via email so you can take a look.

@SansMorel
Copy link
Author

Sorry, I forgot to paste the traceback. Here it is:

4: stop(structure(list(message = "std::bad_alloc", call = (function (ws, 
       group_id, subset, execute, path, cytoset, h5_dir, includeGates, 
       additional_keys, additional_sampleID, keywords, is_pheno_data_from_FCS, 
       keyword_ignore_case, extend_val, extend_to, channel_ignore_case, 
       leaf_bool, include_empty_tree, skip_faulty_gate, comps, transform, 
       fcs_file_extension, greedy_match, fcs_parse_arg, num_threads = 1L) 
   {
       .Call(`_CytoML_parse_workspace`, ws, group_id, subset, execute, 
           path, cytoset, h5_dir, includeGates, additional_keys, 
           additional_sampleID, keywords, is_pheno_data_from_FCS, 
           keyword_ignore_case, extend_val, extend_to, channel_ignore_case, 
           leaf_bool, include_empty_tree, skip_faulty_gate, comps, 
           transform, fcs_file_extension, greedy_match, fcs_parse_arg, 
           num_threads)
   })(ws = <pointer: 0x0000018385d680f0>, group_id = 0, subset = list(), 
       execute = TRUE, path = "", cytoset = <pointer: 0x0000018392b45270>, 
       h5_dir = "C:\\Users\\Sturla\\AppData\\Local\\Temp\\Rtmp6ldHfD", 
       includeGates = TRUE, additional_keys = "$TOT", additional_sampleID = FALSE, 
       keywords = character(0), is_pheno_data_from_FCS = FALSE, 
       keyword_ignore_case = FALSE, extend_val = 0, extend_to = -4000, 
       channel_ignore_case = FALSE, leaf_bool = TRUE, include_empty_tree = FALSE, 
       skip_faulty_gate = FALSE, comps = list(), transform = TRUE, 
       fcs_file_extension = ".fcs", greedy_match = FALSE, fcs_parse_arg = list(), 
       num_threads = 1), cppstack = NULL), class = c("std::bad_alloc", 
   "C++Error", "error", "condition")))
3: (function (ws, group_id, subset, execute, path, cytoset, h5_dir, 
       includeGates, additional_keys, additional_sampleID, keywords, 
       is_pheno_data_from_FCS, keyword_ignore_case, extend_val, 
       extend_to, channel_ignore_case, leaf_bool, include_empty_tree, 
       skip_faulty_gate, comps, transform, fcs_file_extension, greedy_match, 
       fcs_parse_arg, num_threads = 1L) 
   {
       .Call(`_CytoML_parse_workspace`, ws, group_id, subset, execute, 
           path, cytoset, h5_dir, includeGates, additional_keys, 
           additional_sampleID, keywords, is_pheno_data_from_FCS, 
           keyword_ignore_case, extend_val, extend_to, channel_ignore_case, 
           leaf_bool, include_empty_tree, skip_faulty_gate, comps, 
           transform, fcs_file_extension, greedy_match, fcs_parse_arg, 
           num_threads)
   })(ws = <pointer: 0x0000018385d680f0>, group_id = 0, subset = list(), 
       execute = TRUE, path = "", cytoset = <pointer: 0x0000018392b45270>, 
       h5_dir = "C:\\Users\\Sturla\\AppData\\Local\\Temp\\Rtmp6ldHfD", 
       includeGates = TRUE, additional_keys = "$TOT", additional_sampleID = FALSE, 
       keywords = character(0), is_pheno_data_from_FCS = FALSE, 
       keyword_ignore_case = FALSE, extend_val = 0, extend_to = -4000, 
       channel_ignore_case = FALSE, leaf_bool = TRUE, include_empty_tree = FALSE, 
       skip_faulty_gate = FALSE, comps = list(), transform = TRUE, 
       fcs_file_extension = ".fcs", greedy_match = FALSE, fcs_parse_arg = list(), 
       num_threads = 1)
2: do.call(parse_workspace, args)
1: flowjo_to_gatingset(ws, name = 1, execute = T)

@mikejiang
Copy link
Member

I can't reproduce your error. It seems to parse ok for me (on both bioc release and devel branches)

library(CytoML)
wsfile <- "~/Downloads/fcs and wsp/15-Aug-2020.wsp"
ws <- open_flowjo_xml(wsfile)
gs <- flowjo_to_gatingset(ws, name = 1)
library(flowWorkspace)
gh_pop_compare_stats(gs[[1]]) 
openCyto.freq   xml.freq openCyto.count xml.count                                                node
1:    1.00000000 1.00000000        1541997   1541997                                                root
2:    0.32648767 0.32648767         503443    503443                          /Time, Event_length subset
3:    0.08735448 0.08735448          43978     43978 Time, Event_length subset/Time, Event_length subset

But I am on linux, I will give it another try on windows

> sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.4 LTS

Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C              LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] flowWorkspace_4.0.6 CytoML_2.0.5        BiocManager_1.30.10

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6        plyr_1.8.6          pillar_1.4.4        compiler_4.0.0      cytolib_2.0.3       RColorBrewer_1.1-2 
 [7] base64enc_0.1-3     tools_4.0.0         zlibbioc_1.34.0     digest_0.6.25       jsonlite_1.6.1      gtable_0.3.0       
[13] lifecycle_0.2.0     tibble_3.0.1        lattice_0.20-41     png_0.1-7           pkgconfig_2.0.3     rlang_0.4.6        
[19] graph_1.66.0        rstudioapi_0.11     Rgraphviz_2.32.0    yaml_2.2.1          parallel_4.0.0      hexbin_1.28.1      
[25] gridExtra_2.3       xml2_1.3.2          stringr_1.4.0       dplyr_1.0.0         generics_0.0.2      vctrs_0.3.1        
[31] stats4_4.0.0        grid_4.0.0          tidyselect_1.1.0    glue_1.4.1          data.table_1.12.8   Biobase_2.48.0     
[37] R6_2.4.1            jpeg_0.1-8.1        XML_3.99-0.3        RBGL_1.64.0         latticeExtra_0.6-29 ggplot2_3.3.1      
[43] RProtoBufLib_2.0.0  purrr_0.3.4         magrittr_1.5        scales_1.1.1        ellipsis_0.3.1      matrixStats_0.56.0 
[49] BiocGenerics_0.34.0 colorspace_1.4-1    flowCore_2.0.1      ncdfFlow_2.34.0     stringi_1.4.6       munsell_0.5.0      
[55] RcppParallel_5.0.1  crayon_1.3.4        ggcyto_1.16.0  

@mikejiang
Copy link
Member

I've verified it worked fine on windows as well.
So I'd recommend you reinstall cytolib, flowWorkspace and CytoML packages and see if the issue resolved.

@SansMorel
Copy link
Author

SansMorel commented Aug 19, 2020

Hi,

I've reinstalled R, cytolib, flowCore, flowWorkspace and CytoML from github and I still get the same error.

library(CytoML)
wsfile <- "C:/Users/sturl/Downloads/fcs and wsp/15-Aug-2020.wsp"
ws <- open_flowjo_xml(wsfile)
gs <- flowjo_to_gatingset(ws, name = 1)
#> Error in (function (ws, group_id, subset, execute, path, cytoset, backend_dir, : std::bad_alloc
> traceback()
4: stop(structure(list(message = "std::bad_alloc", call = (function (ws, 
       group_id, subset, execute, path, cytoset, backend_dir, backend, 
       includeGates, additional_keys, additional_sampleID, keywords, 
       is_pheno_data_from_FCS, keyword_ignore_case, extend_val, 
       extend_to, channel_ignore_case, leaf_bool, include_empty_tree, 
       skip_faulty_gate, comps, transform, fcs_file_extension, greedy_match, 
       fcs_parse_arg, num_threads = 1L) 
   {
       .Call(`_CytoML_parse_workspace`, ws, group_id, subset, execute, 
           path, cytoset, backend_dir, backend, includeGates, additional_keys, 
           additional_sampleID, keywords, is_pheno_data_from_FCS, 
           keyword_ignore_case, extend_val, extend_to, channel_ignore_case, 
           leaf_bool, include_empty_tree, skip_faulty_gate, comps, 
           transform, fcs_file_extension, greedy_match, fcs_parse_arg, 
           num_threads)
   })(ws = <pointer: 0x00000259e5c54ba0>, group_id = 0, subset = list(), 
       execute = TRUE, path = "", cytoset = <pointer: 0x00000259fa278540>, 
       backend_dir = "C:\\Users\\sturl\\AppData\\Local\\Temp\\RtmpCgHZEj", 
       backend = "h5", includeGates = TRUE, additional_keys = "$TOT", 
       additional_sampleID = FALSE, keywords = character(0), is_pheno_data_from_FCS = FALSE, 
       keyword_ignore_case = FALSE, extend_val = 0, extend_to = -4000, 
       channel_ignore_case = FALSE, leaf_bool = TRUE, include_empty_tree = FALSE, 
       skip_faulty_gate = FALSE, comps = list(), transform = TRUE, 
       fcs_file_extension = ".fcs", greedy_match = FALSE, fcs_parse_arg = list(), 
       num_threads = 1), cppstack = NULL), class = c("std::bad_alloc", 
   "C++Error", "error", "condition")))
3: (function (ws, group_id, subset, execute, path, cytoset, backend_dir, 
       backend, includeGates, additional_keys, additional_sampleID, 
       keywords, is_pheno_data_from_FCS, keyword_ignore_case, extend_val, 
       extend_to, channel_ignore_case, leaf_bool, include_empty_tree, 
       skip_faulty_gate, comps, transform, fcs_file_extension, greedy_match, 
       fcs_parse_arg, num_threads = 1L) 
   {
       .Call(`_CytoML_parse_workspace`, ws, group_id, subset, execute, 
           path, cytoset, backend_dir, backend, includeGates, additional_keys, 
           additional_sampleID, keywords, is_pheno_data_from_FCS, 
           keyword_ignore_case, extend_val, extend_to, channel_ignore_case, 
           leaf_bool, include_empty_tree, skip_faulty_gate, comps, 
           transform, fcs_file_extension, greedy_match, fcs_parse_arg, 
           num_threads)
   })(ws = <pointer: 0x00000259e5c54ba0>, group_id = 0, subset = list(), 
       execute = TRUE, path = "", cytoset = <pointer: 0x00000259fa278540>, 
       backend_dir = "C:\\Users\\sturl\\AppData\\Local\\Temp\\RtmpCgHZEj", 
       backend = "h5", includeGates = TRUE, additional_keys = "$TOT", 
       additional_sampleID = FALSE, keywords = character(0), is_pheno_data_from_FCS = FALSE, 
       keyword_ignore_case = FALSE, extend_val = 0, extend_to = -4000, 
       channel_ignore_case = FALSE, leaf_bool = TRUE, include_empty_tree = FALSE, 
       skip_faulty_gate = FALSE, comps = list(), transform = TRUE, 
       fcs_file_extension = ".fcs", greedy_match = FALSE, fcs_parse_arg = list(), 
       num_threads = 1)
2: do.call(parse_workspace, args)
1: flowjo_to_gatingset(ws)
sessionInfo()
#> R version 4.0.2 (2020-06-22)
#> Platform: x86_64-w64-mingw32/x64 (64-bit)
#> Running under: Windows 10 x64 (build 19041)
#> 
#> Matrix products: default
#> 
#> locale:
#> [1] LC_COLLATE=English_United Kingdom.1252 
#> [2] LC_CTYPE=English_United Kingdom.1252   
#> [3] LC_MONETARY=English_United Kingdom.1252
#> [4] LC_NUMERIC=C                           
#> [5] LC_TIME=English_United Kingdom.1252    
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] CytoML_2.1.11
#> 
#> loaded via a namespace (and not attached):
#>  [1] tidyselect_1.1.0    xfun_0.16           purrr_0.3.4        
#>  [4] lattice_0.20-41     colorspace_1.4-1    vctrs_0.3.2        
#>  [7] generics_0.0.2      htmltools_0.5.0     stats4_4.0.2       
#> [10] ncdfFlow_2.34.0     yaml_2.2.1          base64enc_0.1-3    
#> [13] flowCore_2.1.2      RBGL_1.64.0         XML_3.99-0.5       
#> [16] rlang_0.4.7         hexbin_1.28.1       pillar_1.4.6       
#> [19] glue_1.4.1          aws.s3_0.3.21       Rgraphviz_2.32.0   
#> [22] BiocGenerics_0.34.0 RColorBrewer_1.1-2  plyr_1.8.6         
#> [25] matrixStats_0.56.0  jpeg_0.1-8.1        lifecycle_0.2.0    
#> [28] stringr_1.4.0       zlibbioc_1.34.0     RProtoBufLib_2.0.0 
#> [31] gtable_0.3.0        munsell_0.5.0       cytolib_2.1.17     
#> [34] evaluate_0.14       latticeExtra_0.6-29 Biobase_2.48.0     
#> [37] knitr_1.29          parallel_4.0.2      curl_4.3           
#> [40] flowWorkspace_4.1.8 highr_0.8           Rcpp_1.0.5         
#> [43] scales_1.1.1        S4Vectors_0.26.1    jsonlite_1.7.0     
#> [46] RcppParallel_5.0.2  graph_1.66.0        gridExtra_2.3      
#> [49] ggplot2_3.3.2       png_0.1-7           digest_0.6.25      
#> [52] stringi_1.4.6       dplyr_1.0.2         grid_4.0.2         
#> [55] tools_4.0.2         magrittr_1.5        tibble_3.0.3       
#> [58] crayon_1.3.4        aws.signature_0.6.0 pkgconfig_2.0.3    
#> [61] ellipsis_0.3.1      data.table_1.13.0   xml2_1.3.2         
#> [64] rmarkdown_2.3       httr_1.4.2          R6_2.4.1           
#> [67] ggcyto_1.16.0       compiler_4.0.2

edit: wrong sessionInfo text

@SansMorel
Copy link
Author

If I read the file in flowCore, subset it, and write it as a new fcs.
Then open wsp in text editor and change
<Keyword name="$TOT" value="1541997" />
to
<Keyword name="$TOT" value="10000" />
then it works fine.

library(flowCore)
fcs <- read.FCS("test.fcs", truncate_max_range = F)
fcs <- fcs[1:1e4,]
write.FCS(fcs, "test_copy.fcs")

@SansMorel
Copy link
Author

I don't know if it helps, but doing what I described above (writing fcs files of different sizes) I have been able to figure out the exact number of events leading to the error:
1198372 rows works fine, but 1198373 rows causes the error. This was in a 56 parameter dataset.

If I reduce the number of columns, going from 56 to 55, the number of rows needed to get the error again increases to 1220162.

Looking at the number of elements in the matrix:
1198373 * 56 = 67108888 does not work
1198372 * 56 = 67108832 works

1220162 * 55 = 67108910 does not work
1220161 * 55 = 67108855 works

Seems the threshold is somewhere between 67108855 and 67108888 elements.

Some more tests:
1342177 * 50 = 67108850 works
1266205 * 53 = 67108865 does not work
1491308 * 45 = 67108860 works

My guess is that you might be able to reproduce this error if you make some large matrices.

@SansMorel
Copy link
Author

Going in to the temp folder where h5 data is stored shows me that all the successful tests yield a file max 256MB.
image

Is there a limit to the h5 file size?

@mikejiang
Copy link
Member

I don't think error is from h5

library(CytoML)
wsfile <- "../Downloads/tt/15-Aug-2020.wsp"
ws <- open_flowjo_xml(wsfile)
gs <- flowjo_to_gatingset(ws, name = 1)
h5 <- cf_get_h5_file_path(get_cytoframe_from_cs(gs_cyto_data(gs),1))
utils:::format.object_size(file.size(h5), "auto")
[1] "329.5 Mb"

@mikejiang
Copy link
Member

ok. Somehow I was using 32bit R on windows. After switching to 64bit, I am able to reproduce your error now. I will try to get to the bottom of it.

@mikejiang
Copy link
Member

turned out to be the integer overflow issue. On linux , long is 64 bits wide, but MSVC (and the ABI used by Windows) defines long to be 32 bits wide, which overflows on this particular big dataset. I've switched to int64_t to ensure it is 64 bit across the platform. It should work now.
You will need to reinstall cytolib, flowWorkspace and CytoML.

@DomenicoSkyWalker89
Copy link

Hi Mike,
thank for the help.
The error continue as you can see below. I followed what you told reinstalling cytolib, flowWorkspace and CytoML.
image

The problem persist only for the group 1 and 3 while the group 2 is load correctly.
image

image

Best,
Domenico

@mikejiang
Copy link
Member

The fix is in the bioconductor development branch (probably will appear tomorrow). Or you can install it from source through github repo (if you know how to build the package from source on windows).
So you will be looking for cytolib 2.1.18

@DomenicoSkyWalker89
Copy link

Hi Mike,
thanks a lot again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants