Notebooks and libraries for "A Topic Modeling Analysis of TCGA Breast and Lung Cancer Transcriptomic Data"
In order to analyse results and reproduce plots in the paper without rerunning hSBM use the following notebook hSBM_postprocess.ipynb
This repository, following the structure of the paper, is divided into three parts. See Readme.md in each folder for a detailed description of the specific pipeline.
breast analyses, stochastic block modelling and predictor
lung analyses, stochastic block modelling, survival analysis and predictor
lung data from unified dataset as discussed in the paper
A submodule useful to plot hierarchies
You can simply create a Docker container with all dependencies installed
docker run -v $PWD:/home/jovyan/work -p 8888:8888 --rm -it --name topic_tcga docker.pkg.github.com/fvalle1/topictcga/topic:latest
then point your browser to localhost
The run_graph.ipynb notebook can be used to run hierarchical Stochastic Block Modelling.
The data processed in our analysis when not available trough git can be accessed via DataVersionControl
dvc pull -r mydrive name_of_the_file_to_download.dvc
Please see LICENSE