Skip to content

doubleTime is a method to estimate the timing of whole-genome doubling event(s) on a clone tree using SNVs

License

Notifications You must be signed in to change notification settings

shahcompbio/doubleTime

Repository files navigation

doubleTime

doubleTime is a method to estimate the timing of whole-genome doubling event(s) on a clone tree using SNVs. The current doubleTree software is designed for DLP+ single-cell whole-genome sequencing data.

Method

The doubleTime algorithm consists of the following steps:

  1. Construct clone tree from SBMClone output using the perfect phylogeny algorithm.
  2. Construct clone copy-number profiles and summarize SNV-covering reads at the clone level.
  3. Assign SNVs to branches of the tree.

Inputs

Running doubleTime requires the following input files:

  • cna_adata: Single-cell haplotype-specific copy-number calls
  • snv_adata: Counts for the number of SNV-covering and SNV-supporting reads in each cell for each SNV
  • genome_fasta_filename: Reference genome that was used for SNV and CNA analysis (FASTA file)

These input files are passed into the pipeline as a configuration file (e.g. demo.yaml). To run doubleTree on your own data, you will need to provide a configuration file with the same format as demo.yaml and point to the appropriate input files. Note that the SBM clone results, which groups cells into clones, is not needed in the configuration file despite being an input for the doubleTime algorithm as it is generated by the first step in the pipeline (script infer_sbmclone_tree.py).

Additionally, the configuration file contains parameters for the doubleTime algorithm:

  • wgd_depth: The depth of the whole-genome doubling event(s) in the clone tree. If set to a negative integer (i.e. -1), the pipeline will attempt to automatically detect the WGD depth. If set to a specific value, the pipeline will use that value as the WGD depth. For mor information on how to manually set this parameter, see the notebooks/demo.ipynb notebook. This should be set to -1 by default.
  • tree_snv_min_clone_size: Minimum number of cells in a clone. Clones smaller than this threshold will be removed from the tree. This should be set to 20 by default.
  • binarization_threshold: Threshold for binarizing SNV calls from SBMclone. This should be set to 0.01 by default.

Outputs

doubleTime produces output files with the following suffixes (prefix is the patient name):

  • _annotated_tree.pickle: Clone tree with WGD events assigned to branches and SNV-derived branch lengths
  • _cna_clustered.h5 / _snv_clustered.h5: Clustered anndatas representing aggregate copy-number calls and SNV counts at the clone level
  • _tree_snv_assignment.csv: Table containing SNV metrics and assignments to branches of the tree

An intermediate tree without annotations (_tree.pickle) is also produced and can be safely ignored or deleted.

In addition to output .csv, .pickle, and .h5 files, doubleTime also produces visualizations of the clone tree and SNV assignments through a series of .pdf figures generated by the plot_qc_output.py script at the end of the snakemake pipeline. The most commonly used figures are as follows:

  • _wgd_tree.pdf: Clone tree with the x-axis being the total number of SNVs and branches split by WGD events
  • _CpG_tree.pdf: Same as the _wgd_tree.pdf but with the x-axis showing the number of CpG SNVs instead of all SNVs. Descriptions of the other figures can be found in the comments of the plot_qc_output.py script.

See demo/output/plots/ or notebooks/demo.ipynb for examples.

Setup

  1. Clone this repository
  2. Install dependencies from environment.yml: conda env create -n doubletime --file environment.yml
  3. Activate the conda environment: conda activate doubletime
  4. Install the doubleTime package: pip install -e . or python setup.py install. If you wish to install in development mode, use python setup.py develop.

Optional: to run the demo, you will need to point demo.yaml to the reference genome GRCh37-lite.fa, which can be found here: https://www.bcgsc.ca/downloads/genomes/9606/hg19/1000genomes/bwa_ind/genome/GRCh37-lite.fa

Usage

With the doubletime conda environment activated, we can execute the snakemake pipeline with the following command.

snakemake --snakefile doubleTime.smk --configfile demo.yaml --cores 1

Here, we specify the configuration file (demo.yaml), which contains the input data and parameters for the pipeline. We also specify the number of cores to use (1 in this case). Running doubleTree on the input data in demo/input should produce the output files in demo/output.

If you wish to run doubleTime outside of this snakemake pipeline, you can reference the notebooks/demo.ipynb notebook for an example of how to run the doubleTime algorithm in a standalone manner. Doubletime should be imported at the top of a python file using import doubletime as dt. After importing, users can call doubletime-specific classes and functions that are found in the model module (dt.ml), the preprocessing module (dt.pp), the tools module (dt.tl), and the plotting module (dt.pl). Be careful to run the preprocessing steps in the correct order prior to running the doubleTime model.

About

doubleTime is a method to estimate the timing of whole-genome doubling event(s) on a clone tree using SNVs

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •