doubleTime

doubleTime is a method to estimate the timing of whole-genome doubling event(s) on a clone tree using SNVs. The current doubleTree software is designed for DLP+ single-cell whole-genome sequencing data.

Method

The doubleTime algorithm consists of the following steps:

Construct clone tree from SBMClone output using the perfect phylogeny algorithm.
Construct clone copy-number profiles and summarize SNV-covering reads at the clone level.
Assign SNVs to branches of the tree.

Inputs

Running doubleTime requires the following input files:

cna_adata: Single-cell haplotype-specific copy-number calls
snv_adata: Counts for the number of SNV-covering and SNV-supporting reads in each cell for each SNV
genome_fasta_filename: Reference genome that was used for SNV and CNA analysis (FASTA file)

These input files are passed into the pipeline as a configuration file (e.g. demo.yaml). To run doubleTree on your own data, you will need to provide a configuration file with the same format as demo.yaml and point to the appropriate input files. Note that the SBM clone results, which groups cells into clones, is not needed in the configuration file despite being an input for the doubleTime algorithm as it is generated by the first step in the pipeline (script infer_sbmclone_tree.py).

Additionally, the configuration file contains parameters for the doubleTime algorithm:

wgd_depth: The depth of the whole-genome doubling event(s) in the clone tree. If set to a negative integer (i.e. -1), the pipeline will attempt to automatically detect the WGD depth. If set to a specific value, the pipeline will use that value as the WGD depth. For mor information on how to manually set this parameter, see the notebooks/demo.ipynb notebook. This should be set to -1 by default.
tree_snv_min_clone_size: Minimum number of cells in a clone. Clones smaller than this threshold will be removed from the tree. This should be set to 20 by default.
binarization_threshold: Threshold for binarizing SNV calls from SBMclone. This should be set to 0.01 by default.

Outputs

doubleTime produces output files with the following suffixes (prefix is the patient name):

_annotated_tree.pickle: Clone tree with WGD events assigned to branches and SNV-derived branch lengths
_cna_clustered.h5 / _snv_clustered.h5: Clustered anndatas representing aggregate copy-number calls and SNV counts at the clone level
_tree_snv_assignment.csv: Table containing SNV metrics and assignments to branches of the tree

An intermediate tree without annotations (_tree.pickle) is also produced and can be safely ignored or deleted.

In addition to output .csv, .pickle, and .h5 files, doubleTime also produces visualizations of the clone tree and SNV assignments through a series of .pdf figures generated by the plot_qc_output.py script at the end of the snakemake pipeline. The most commonly used figures are as follows:

_wgd_tree.pdf: Clone tree with the x-axis being the total number of SNVs and branches split by WGD events
_CpG_tree.pdf: Same as the _wgd_tree.pdf but with the x-axis showing the number of CpG SNVs instead of all SNVs. Descriptions of the other figures can be found in the comments of the plot_qc_output.py script.

See demo/output/plots/ or notebooks/demo.ipynb for examples.

Setup

Clone this repository
Install dependencies from environment.yml: conda env create -n doubletime --file environment.yml
Activate the conda environment: conda activate doubletime
Install the doubleTime package: pip install -e . or python setup.py install. If you wish to install in development mode, use python setup.py develop.

Optional: to run the demo, you will need to point demo.yaml to the reference genome GRCh37-lite.fa, which can be found here: https://www.bcgsc.ca/downloads/genomes/9606/hg19/1000genomes/bwa_ind/genome/GRCh37-lite.fa

Usage

With the doubletime conda environment activated, we can execute the snakemake pipeline with the following command.

snakemake --snakefile doubleTime.smk --configfile demo.yaml --cores 1

Here, we specify the configuration file (demo.yaml), which contains the input data and parameters for the pipeline. We also specify the number of cores to use (1 in this case). Running doubleTree on the input data in demo/input should produce the output files in demo/output.

If you wish to run doubleTime outside of this snakemake pipeline, you can reference the notebooks/demo.ipynb notebook for an example of how to run the doubleTime algorithm in a standalone manner. Doubletime should be imported at the top of a python file using import doubletime as dt. After importing, users can call doubletime-specific classes and functions that are found in the model module (dt.ml), the preprocessing module (dt.pp), the tools module (dt.tl), and the plotting module (dt.pl). Be careful to run the preprocessing steps in the correct order prior to running the doubleTime model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

doubleTime

Method

Inputs

Outputs

Setup

Usage

About

Releases

Packages

Contributors 4

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
demo		demo
doubletime		doubletime
notebooks		notebooks
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
demo.yaml		demo.yaml
doubleTime.smk		doubleTime.smk
environment.yml		environment.yml
setup.py		setup.py

License

shahcompbio/doubleTime

Folders and files

Latest commit

History

Repository files navigation

doubleTime

Method

Inputs

Outputs

Setup

Usage

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages