Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot read a specific FCS 3.1 file for unknown reason #269

Open
CroixJeremy2 opened this issue May 29, 2024 · 2 comments
Open

Cannot read a specific FCS 3.1 file for unknown reason #269

CroixJeremy2 opened this issue May 29, 2024 · 2 comments

Comments

@CroixJeremy2
Copy link

CroixJeremy2 commented May 29, 2024

Hello,
I am quite new in reading fcs files using R, therefore I am sorry in advance if the issue have already been raised in the past (I couldn't find similar issues on RGLab/flowCore/issues though).

Here are three FCS 3.1 files that have been generated on the same machine (https://research.pasteur.fr/en/equipment/big-foot/) during the same day for a sorting experiment: https://dl.pasteur.fr/fop/P67e8o6r/Test_folder.zip

Screenshot 2024-05-29 at 11 31 42

However, I cannot read Sample_A, while Sample_B and Sample_C can be perfectly read using read.FSC(). Interestingly, all three files are correctly detected as FCS files using isFCSfile(). The problem happens in MacOS and also on Windows with fresh installation of R and flowCore. Here are the script and outputs from R:

library(flowCore)

# Windows
# setwd("C:/Users/jchantre/Desktop/Test_folder")

# MacOS
# setwd("/Users/jchantre/Desktop/Test_folder")

isFCSfile("Sample_A.fcs")
isFCSfile("Sample_B.fcs")
isFCSfile("Sample_C.fcs")

a = read.FCS("Sample_A.fcs")
b = read.FCS("Sample_B.fcs")
c = read.FCS("Sample_C.fcs")

head(a)
head(b)
head(c)

On MacOS:
Screenshot 2024-05-29 at 11 43 18

On Windows:
Capture

Is there a way to solve this error message?

> a = read.FCS("Sample_A.fcs")
Error in rawToChar(txt) : 
  embedded nul in string: '*$BEGINANALYSIS*000000000000*$BEGINDATA*000000006769*$BEGINSTEXT*000000000000*$BYTEORD*1,2,3,4*$DATATYPE*F*$ENDANALYSIS*000000000000*$ENDDATA*005622116501*$ENDSTEXT*000000000000*$MODE*L*$NEXTDATA*000000000000*$PAR*29*$P1B*32*$P1E*0,0*$P1N*Time*$P1R*2147483647*$P1L*0*$P1O*0*$P1S*Time*$P1V*0*$P2B*32*$P2E*0,0*$P2N*FSC04-A*$P2R*100000*$P2F*488_FSC*$P2L*488*$P2O*125*$P2S*488 FSC-A*$P2V*306*$P3B*32*$P3E*0,0*$P3N*SSC56-A*$P3R*100000*$P3F*488_SSC*$P3L*488*$P3O*125*$P3S*488 SSC-A*$P3V*362*$P4B*32*$P4E*0,0*$P4N*FSC04-H*$P4R*100000*$P4F*488_FSC*$P4L*488*$P4O*125*$P4S*488 FSC-H*$P4V*306*$P5B*32*$P5E*0,0*$P5N*FSC04-W*$P5R*100000*$P5F*488_FSC*$P5L*488*$P5O*125*$P5S*488 FSC-W*$P5V*306*$P6B*32*$P6E*0,0*$P6N*SSC56-H*$P6R*100000*$P6F*488_SSC*$P6L*488*$P6O*125*$P6S*488 SSC-H*$P6V*362*$P7B*32*$P7E*0,0*$P7N*SSC56-W*$P7R*100000*$P7F*488_SSC*$P7L*488*$P7O*125*$P7S*488 SSC-W*$P7V*362*$P8B*32*$P8E*0,0*$P8N*FSC58-H*$P8R*100000*$P8F*488_FSC_Polar*$P8L*488*$P8O*125*$P8S*488 FSC P

Thanks in advance for your response,
Best regards,

sessionInfo() in MacOS:

R version 4.3.3 (2024-02-29) Platform: x86_64-apple-darwin20 (64-bit) Running under: macOS Monterey 12.7.5

Matrix products: default
BLAS: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib; LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Paris
tzcode source: internal

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] flowCore_2.14.2

loaded via a namespace (and not attached):
[1] compiler_4.3.3 RProtoBufLib_2.14.1 cytolib_2.14.1 Biobase_2.62.0 S4Vectors_0.40.2 BiocGenerics_0.48.1 matrixStats_1.3.0 stats4_4.3.3

sessionInfo() in Windows:

R version 4.4.0 (2024-04-24 ucrt) Platform: x86_64-w64-mingw32/x64 Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252
[2] LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

time zone: Europe/Paris
tzcode source: internal

attached base packages:
[1] stats graphics grDevices utils datasets methods base

other attached packages:
[1] flowCore_2.16.0

loaded via a namespace (and not attached):
[1] compiler_4.4.0 RProtoBufLib_2.16.0 cytolib_2.16.0
[4] Biobase_2.64.0 S4Vectors_0.42.0 BiocGenerics_0.50.0
[7] matrixStats_1.3.0 stats4_4.4.0

@SamGG
Copy link
Contributor

SamGG commented May 29, 2024

Hi,
In fact, the header of Sample_A is wrong (explanations below).
FlowCore could take this error into account, but this will not correct a bug issued by the instrument. @mikejiang should/will you?
Meanwhile, if you have only a few files to correct, I suggest to use HxD 64bits with caution.

image

My best wishes to Bernd.
Best,
Samuel

In B, the end of the text is at 4902, and the start of data is 4903. Both are on this screenshot.
image
These values are correct and agree with the data in the screenshot below.
image

In A, the end is incorrect (no idea why)
image
The real end is at 4888 as in the following screenshot.
image
Use HxD and change 6768 into 4888 and 6769 into 4889, and magic might appear... but as the end of data is also wrongly reported, you should have a little more work if flowCore doesn't manage it.

@gfinak
Copy link
Member

gfinak commented May 29, 2024

Thanks, @SamGG.
My 2c here, @CroixJeremy2, your instrument generated an invalid file ( in that it set the begin and end of data / text wrong in the header). You can certainly manually fix this...but why would you trust the data from this file.. who knows what else the software / instrument corrupted. If this data matters in any way.. I wouldn't trust it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants