CalicoST is a probabilistic model that infers allele-specific copy number aberrations and tumor phylogeography from spatially resolved transcriptomics.CalicoST has the following key features:
- Identifies allele-specific integer copy numbers for each transcribed region, revealing events such as copy neutral loss of heterozygosity (CNLOH) and mirrored subclonal CNAs that are invisible to total copy number analysis.
- Assigns each spot a clone label indicating whether the spot is primarily normal cells or a cancer clone with aberration copy number profile.
- Infers a phylogeny relating the identified cancer clones as well as a phylogeography that combines genetic evolution and spatial dissemination of clones.
- Handles normal cell admixture in SRT technologies hat are not single-cell resolution (e.g. 10x Genomics Visium) to infer more accurate allele-specific copy numbers and cancer clones.
- Simultaneously analyzes multiple regions or aligned SRT slices from the same tumor.
The package has tested on the following Linux operating systems: SpringdaleOpenEnterprise 9.2 (Parma) and CentOS Linux 7 (Core).
First setup a conda environment from the environment.yml
file:
git clone https://github.com/raphael-group/CalicoST.git
cd CalicoST
conda env create -f environment.yml --name calicost_env
Then, install CalicoST using pip by
conda activate calicost_env
pip install -e .
Setting up the conda environments takes around 15 minutes on an HPC head node.
CalicoST requires allele count matrices for reference-phased A and B alleles for inferring allele-specific CNAs, and provides a snakemake pipeline for obtaining the required matrices from a BAM file. Run the following commands in CalicoST directory for installing additional package, Eagle2, for snakemake preprocessing pipeline.
mkdir external
wget --directory-prefix=external https://storage.googleapis.com/broad-alkesgroup-public/Eagle/downloads/Eagle_v2.4.1.tar.gz
tar -xzf external/Eagle_v2.4.1.tar.gz -C external
Based on the inferred cancer clones and allele-specific CNAs by CalicoST, we apply Startle to reconstruct a phylogenetic tree along the clones. Install Startle by
git clone --recurse-submodules https://github.com/raphael-group/startle.git
cd startle
mkdir build; cd build
cmake -DLIBLEMON_ROOT=<lemon path>\
-DCPLEX_INC_DIR=<cplex include path>\
-DCPLEX_LIB_DIR=<cplex lib path>\
-DCONCERT_INC_DIR=<concert include path>\
-DCONCERT_LIB_DIR=<concert lib path>\
..
make
To infer allele-specific CNAs, we generate allele count matrices in this preprocessing step. We followed the recommended pipeline by Numbat, which is designed for scRNA-seq data to infer clones and CNAs: first genotyping using the BAM file by cellsnp-lite (included in the conda environment) and reference-based phasing by Eagle2. Download the following panels for genotyping and reference-based phasing.
- SNP panel - 0.5GB in size. You can also choose other SNP panels from cellsnp-lite webpage.
- Phasing panel- 9.0GB in size. Unzip the panel after downloading.
Replace the following paths config.yaml
:
region_vcf
: Replace with the path of downloaded SNP panel.phasing_panel
: Replace with the unzipped directory of the downloaded phasing panel.spaceranger_dir
: Replace with the spaceranger directory of your Visium data, which should contain the BAM filepossorted_genome_bam.bam
.output_snpinfo
: Replace with the desired output directory.- Replace
calicost_dir
andeagledir
with the path to the cloned CalicoST directory and downloaded Eagle2 directory.
Then you can run preprocessing pipeline by
snakemake --cores <number threads> --configfile config.yaml --snakefile calicost.smk all
Replace the paths in the parameter configuration file configuration_purity
with the corresponding data/reference file paths and run
OMP_NUM_THREADS=1 <CalicoST directory>/src/calicost/estimate_tumor_proportion.py -c configuration_purity
Replace the paths in parameter configuration file configuration_cna
with the corresponding data/reference file paths and run
OMP_NUM_THREADS=1 python <CalicoST directory>/src/calicost/calicost_main.py -c configuration_cna
When jointly inferring clones and CNAs across multiple SRT slices, prepare a table with the following columns (See examples/example_input_filelist
as an example).
Path to BAM file | sample ID | Path to Spaceranger outs
Modify configuration_cna_multi
with paths to the table and run
OMP_NUM_THREADS=1 python <CalicoST directory>/src/calicost/calicost_main.py -c configuration_cna_multi
python <CalicoST directory>/src/calicost/phylogeny_startle.py -c <CalicoST clone and CNA output directory> -s <startle executable path> -o <output directory>
Check out our readthedocs for the following tutorials:
-
Inferring clones and allele-specific CNAs on simulated data The simulated count matrices and parameter configuration file are available from
examples/simulated_example.tar.gz
. CalicoST takes about 2h to finish on this example. -
Inferring tumor purity, clones, allele-specific CNAs, and phylogeography on prostate cancer data The transcript count, allele count matrices, and running configuration fies are available from
examples/prostate_example.tar.gz
. This sample contains five slices and over 10000 spots, CalicoST takes about 9h to finish on this example.
The above snakemake run will create a folder calicost
in the directory of downloaded example data. Within this folder, each random initialization of CalicoST generates a subdirectory of calicost/clone*
.
CalicoST generates the following key files of each random initialization:
- clone_labels.tsv: The inferred clone labels for each spot.
- cnv_seglevel.tsv: Allele-specific copy numbers for each clone for each genome segment.
- cnv_genelevel.tsv: The projected allele-specific copy numbers from genome segments to the covered genes.
- cnv_diploid_seglevel.tsv, cnv_triploid_seglevel.tsv, cnv_tetraploid_seglevel.tsv, cnv_diploid_genelevel.tsv, cnv_triploid_genelevel.tsv, cnv_tetraploid_genelevel.tsv: Allele-specific copy numbers when enforcing a ploidy for each genome segment or each gene.
See the following examples of the key files.
head -10 calicost/clone3_rectangle0_w1.0/clone_labels.tsv
BARCODES clone_label
spot_0 2
spot_1 2
spot_2 2
spot_3 2
spot_4 2
spot_5 2
spot_6 2
spot_7 2
spot_8 0
head -10 calicost/clone3_rectangle0_w1.0/cnv_seglevel.tsv
CHR START END clone0 A clone0 B clone1 A clone1 B clone2 A clone2 B
1 1001138 1616548 1 1 1 1 1 1
1 1635227 2384877 1 1 1 1 1 1
1 2391775 6101016 1 1 1 1 1 1
1 6185020 6653223 1 1 1 1 1 1
1 6785454 7780639 1 1 1 1 1 1
1 7784320 8020748 1 1 1 1 1 1
1 8026738 9271273 1 1 1 1 1 1
1 9292894 10375267 1 1 1 1 1 1
1 10398592 11922488 1 1 1 1 1 1
head -10 calicost/clone3_rectangle0_w1.0/cnv_genelevel.tsv
gene clone0 A clone0 B clone1 A clone1 B clone2 A clone2 B
A1BG 1 1 1 1 1 1
A1CF 1 1 1 1 1 1
A2M 1 1 1 1 1 1
A2ML1-AS1 1 1 1 1 1 1
AACS 1 1 1 1 1 1
AADAC 1 1 1 1 1 1
AADACL2-AS1 1 1 1 1 1 1
AAK1 1 1 1 1 1 1
AAMP 1 1 1 1 1 1
CalicoST graphs the following plots for visualizing the inferred cancer clones in space and allele-specific copy number profiles for each random initialization.
- plots/clone_spatial.pdf: The spatial distribution of inferred cancer clones and normal regions (grey color, clone 0 by default)
- plots/rdr_baf_defaultcolor.pdf: The read depth ratio (RDR) and B allele frequency (BAF) along the genome for each clone. Higher RDR indicates higher total copy numbers, and a deviation-from-0.5 BAF indicates allele imbalance due to allele-specific CNAs.
- plots/acn_genome.pdf: The default allele-specific copy numbers along the genome.
- plots/acn_genome_diploid.pdf, plots/acn_genome_triploid.pdf, plots/acn_genome_tetraploid.pdf: Allele-specific copy numbers when enforcing a ploidy.
The allele-specific copy number plots have the following color legend.
CalicoST uses the following command-line packages and python for extracting the BAF information
- samtools
- cellsnp-lite
- Eagle2
- pysam
- snakemake
CalicoST uses the following packages for the remaining steps to infer allele-specific copy numbers and cancer clones:
- numpy
- scipy
- pandas
- scikit-learn
- scanpy
- anndata
- numba
- tqdm
- statsmodels
- networkx
- matplotlib
- seaborn
- snakemake
The CalicoST manuscript is available on bioRxiv. If you use CalicoST for your work, please cite our paper.
@article{ma2024inferring,
title={Inferring allele-specific copy number aberrations and tumor phylogeography from spatially resolved transcriptomics},
author={Ma, Cong and Balaban, Metin and Liu, Jingxian and Chen, Siqi and Ding, Li and Raphael, Benjamin},
journal={bioRxiv},
pages={2024--03},
year={2024},
publisher={Cold Spring Harbor Laboratory}
}