-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question: library and sample format in cas9.py #45
Comments
Hello Akshita, Thanks for your interest in using ScreenPro2, happy to respond to any questions!
Sounds like a good plan! As part of #44, I'm actively working on more user friendly interface for ScreenPro2.
That's a good question! You need to have the have the library table and modify that based on your sequencing strategy. Can you tell me if what CRISPR library and sequencing protocol you are using? Is it from Replogle et al., eLife (2022) and in that case your library-prep protocol should be this, right? Once you clarify I can help you run the codes in jupyter.
That's true. This function is going to be changes in the next release. For the meantime, this is what I have done: I modified This is how I did fastq2count task for a dataset based on the Replogle et al., eLife (2022) protocol: df_count = cas9.fastq_to_count_dual_guide(
R1_fastq_file_path=f'{fastq_dir}/{sample_id}_R1.fastq.gz',
R2_fastq_file_path=f'{fastq_dir}/{sample_id}_R2.fastq.gz',
trim5p_pos1_start=1,
trim5p_pos1_length=19,
trim5p_pos2_start=1,
trim5p_pos2_length=19,
verbose=verbose
) |
Thanks for your response! Our library is a dual-gRNA library, each protospacer being 20 bp, which we cloned using this protocol from the Weissman lab https://weissman.wi.mit.edu/resources/2022_crispri_protocols/Protocol_1_dual_sgRNA_lib_cloning.pdf We sequenced our samples via paired-end 250 NGS. And I already have trimmed R1 and R2 files that we processed using Geneious. So every read should already be trimmed to the 20 bp guide sequence that we are interested in counting. So, I was currently omitting the trimming arguments in the count function. I was able to run the count function and obtain a df_count output. Let me know how the library table should be set up, what column names I should be using, etc. And let me know if any more information would help! Thanks, |
Okay, it make sense. If you already have your res = df_count.copy()
res = res.sort('count', descending=True)
res = res.with_columns(
pl.concat_str(
[
pl.col("protospacer_a"),
pl.col("protospacer_b")
],
separator=";",
).alias("sequence"),
)
res_map = pl.DataFrame(library).join(
res, on="sequence", how="left"
) ** You may need to iterate at this step to make sure your "sequence" columns are defined properly (for instance, we don't sequence the first bp so we have 19bp in R1 / R2 instead of 20; yours can be a bit different). As a simple QC, you can do something like this print("% mapped reads",
100 * \
res_map.to_pandas()['count'].fillna(0).sum() / \
int(res.select(pl.sum("count")).to_pandas()['count'])
) Best of luck. |
@akshitasax, in the meantime, if you don't mind please ⭐ the repo ;) |
#47 discussed here. |
@akshitasax – I think you get your answers here, right? Fell free to open new issue / reopen this / or email me if you have other questions. |
Hello!
I am interested in using ScreenPro2/ScreenPro to analyze reads from a dual-guide CRISPR screen. Specifically, I wanted to get counts from my reads files and map them to my dual-guide library. I understand that user guidelines for the 'Fastq to counts' pipeline have not been posted to the readme yet, so I wanted to try to run the functions in Cas9.py on a jupyter notebook. What is the format of the dual-guide library that is given as an input in the map_sample_counts_to_library(library, sample) function? And is 'sample' the df_count returned by fastq_to_count_dual_guide()?
Thank you,
Akshita
The text was updated successfully, but these errors were encountered: