Current version for scopen
is 0.1.7
scopen
has been test with following OS:
macOS Big Sur (11.4)
Linux (4.18.0)
scopen
has been test with Python 3.6, 3.7, 3.8 and 3.9.
We recommend to use Miniconda to setup
the environment.
numpy (>=1.20.3)
scipy (>=1.6.3)
h5py (>=3.2.1)
pandas (>=1.2.4)
PyTables (>=3.6.1)
matplotlib (>=3.4.2)
scikit-learn (>=0.24.2)
kneed (>=0.7.0)
The easiest way to install scopen and the required packages is using pip
pip install scopen
The installation will take ~20 seconds.
We here describe how to run scopen
.
scopen
performs imputation and dimensionality reduction based on peak by
cell matrix and it allows different input formats. The simplest one is a
text file where each row represent a peak, and
each column is a cell.
Here, we provide an example data in demo
folder, which is a
peak by cell count matrix from human hematopoietic cells.
First uncompress the file:
cd demo
gzip -d TagCount.txt.gz
Execute below command to run scopen:
scopen --input TagCount.txt --input_format dense --output_dir ./ --output_prefix scOpen --output_format dense --verbose 0 --estimate_rank --nc 4
--input_format
: this option specifies the input format as dense for which
a text file is expected
--output_dir
: all output files will be saved in current directory
--output_prefix
: output file name
--verbose
: verbose level
--estimate_rank
: the number of ranks will be automatically selected
--nc
: how many cores will be used
See more information by:
scopen --help
The expected running time is ~18 minutes.
After the command is done, you can find 5 output files in current directory:
-
scOpen.txt
. An imputed matrix. It has same dimensions as input and can be used for downstream analysis, such as peak-to-peak co-accessibility prediction. -
scOpen_barcodes.txt
. A low-dimension matrix for cells. The number of dimensions is determined by option--estimate_rank
. It can be used as a dimension reduced matrix for clustering and visualization. -
scOpen_peaks.txt
. A low-dimension matrix for peaks. -
scOpen_error.pdf
. A line plot showing the model selection process, where x-axis represent ranks (or dimensions), y-axis is the fitting error of NMF. scOpen selects the best model by identifying a elbow point from this curve. -
scOpen_error.txt
. A text file including data for above curve.
As described about, scopen
also supports following input formats.
- 10X: a folder including barcodes.tsv, matrix.mtx and peaks.bed, it is usually generated by using cellranger-atac pipeline.
- 10Xh5: a peak barcode matrix in hdf5 format.
- sparse: a text file including three columns, the first column indicates peaks, the secondcolumns represent barcodes and the third one is the number of reads
scOpen is implemented in python, while many popular tools for analysis scATAC-seq, such as
Signac, are developed using R.
If you are dedicated to R, we also provide a tutorial
here to
show you how use scopen
as a dimension reduction method in R to analyze scATAC-seq data
from human peripheral blood mononuclear cells (PBMCs) dataset.
Python is gaining popularity in single-cell data analysis.
Two examples are scanpy (for scRNA-seq) and episcanpy (for single cell epigenomic data, e.g., scATAC-seq).
To ensure scopen
is usable in this context, we provide a jupyter notebook to
show you how to combine scOpen and (epi)scanpy to analysis scATAC-seq data.
For reproducibility, we provide all scripts and data here.