NOTE: only works on Mac or Linux OS
Db_Compare
Compares the databases Reactome, KEGG, HPRD, WikiPathways and PhosphoSitePlus on their coverage of phospho/proteomic data.
The following conda
command will create an environment called DbCompareConda
with all dependencies installed
conda create --name DbCompareConda \
--channel conda-forge \
--channel bioconda \
python=3 \
pandas \
requests \
urllib3 \
tqdm \
r=3.6 \
r-upsetr \
r-sna \
r-plotrix \
r-ggplot2 \
bioconductor-clusterprofiler \
bioconductor-org.hs.eg.db \
pandoc \
openjdk=8
To run Db_Compare
follow these steps:
- Clone this repo (as below) or create a new directory and place the provided scripts, files and folders inside.
git clone https://github.com/HannahHuckstep/Db_Compare.git
-
[OPTIONAL] download the most current version of databases into
Db_Compare/
. By defaultDb_Compare
will use the database file inrefDatabaseFiles.zip
andsigDatabaseFiles.zip
. See the bullet points further down the page for names and locations of databases. -
Then run the bash script after navigating to the
Db_Compare/
directory.
conda activate DbCompareConda
cd Db_Compare/
bash dbCompareScript.bash
- Once the analysis has completed open
dbCompareNotebook.html
to view the resulting plots. The consistency analysis results can be found in each's database directory (For Reactome, PhosphoSitePlus, HPRD, and qPhos)
Download the following files and place in the directory created in step 1.
- Download the OWL file for Ractome, use Homo_sapiens (from Reactome, BioPAX level 3) and name it RXM.owl
- Download the OWL file for PhosphoSitePlus (from PhosphoSitePlus, BioPAX:Kinase-substrate information) and name it PSP.owl
- Download the file for the full version of PhosphoSitePlus (from PhosphoSitePlus, Phosphorylation_site_dataset) and name it PSP_full.tsv
- Download the gmt file for WikiPathways (from WikiPathways, Homo_sapiens Gene lists per pathway(GMT)) and name it WP.tsv
- Download the flat files for HPRD (from INDRA Flat files) and rename HPRD_ID_MAPPINGS.txt to HPRD_UIDs.tsv and rename POST_TRANSLATIONAL_MODIFICATIONS.txt to HPRD_mods.tsv
- Download the flat files for BioGRID (from BioGRID Organism tab3) unzip and rename BIOGRID-ORGANISM-Homo_sapiens-4.2.193.tab3.txt to BG_UIDs.tsv and unzip and rename BIOGRID-PTMS-4.3.194.ptm.zip to BG_mods.tsv
- Download the proteins (UniProt ids) for SIGNOR (from SIGNOR API) and rename to SIGNOR_UIDs.tsv
- Download the phosphorylations from SIGNOR (from SIGNOR Homo sapiens Phosphorylation Data) and rename to SIGNOR_mods.tsv
- Download the ppi from IMEX (from IMEX intact-micluster.txt from psi-mitab) and name it IMEX.tsv
- Request the dataset from qPhos and name the file QPHOS_DATA.tsv
- Copy the qPhos supplementary data file from the 'about' page in the qPhos website and name the file QPHOS_SUPP_DATA.tsv