Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

h5ad to BAM matching #42

Open
davek44 opened this issue Feb 3, 2024 · 0 comments
Open

h5ad to BAM matching #42

davek44 opened this issue Feb 3, 2024 · 0 comments

Comments

@davek44
Copy link

davek44 commented Feb 3, 2024

Hi, thanks for an incredible resource! I'm trying to relate the Smart-seq BAMs to the cell annotations, but encountering problems. Namely, the gene expression vectors from the fully processed TabulaSapiens.h5ad frequently seem to not correspond to the BAMs for the matched names.

For example, cell row 448105 in the h5ad has index B107919_H10_S31.homo.gencode.v30.ERCC.chrM.
That appears to match s3://czb-tabula-sapiens/Pilot1/alignment-gencode/SS2/B107919_H10_S31.homo.gencode.v30.ERCC.chrM.Aligned.out.sorted.bam.

The top expressed genes in the h5ad via 'raw_counts' correspond to FTL, GPX1, TSP, PFN1, etc. However, none of those genes have aligned reads in the BAM file.

In the AWS bucket, there is a Pilot1 count table s3://czb-tabula-sapiens/Pilot1/smartseq2_gene_count_tables/pilot/190627_A00111_0335_BHLMG5DSXX.csv.
The top expressed genes in this table for cell 'B107919_H10_S31.homo' have many aligned reads in the BAM, as expected.

The expression vectors from the bucket CSV versus h5ad for this cell have SpearmanR 0.08, and a scatter plot indicates they do not match.

Could you help me understand where I'm going wrong here? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant