Db_Compare

NOTE: only works on Mac or Linux OS

Db_Compare Compares the databases Reactome, KEGG, HPRD, WikiPathways and PhosphoSitePlus on their coverage of phospho/proteomic data.

Requirements

The following conda command will create an environment called DbCompareConda with all dependencies installed

conda create --name DbCompareConda \
  --channel conda-forge \
  --channel bioconda \
  python=3 \
  pandas \
  requests \
  urllib3 \
  tqdm \
  r=3.6 \
  r-upsetr \
  r-sna \
  r-plotrix \
  r-ggplot2 \
  bioconductor-clusterprofiler \
  bioconductor-org.hs.eg.db \
  pandoc \
  openjdk=8

Usage

To run Db_Compare follow these steps:

Clone this repo (as below) or create a new directory and place the provided scripts, files and folders inside.

git clone https://github.com/HannahHuckstep/Db_Compare.git

[OPTIONAL] download the most current version of databases into Db_Compare/. By default Db_Compare will use the database file in refDatabaseFiles.zip and sigDatabaseFiles.zip. See the bullet points further down the page for names and locations of databases.
Then run the bash script after navigating to the Db_Compare/ directory.

conda activate DbCompareConda
cd Db_Compare/
bash dbCompareScript.bash

Once the analysis has completed open dbCompareNotebook.html to view the resulting plots. The consistency analysis results can be found in each's database directory (For Reactome, PhosphoSitePlus, HPRD, and qPhos)

Databases for step 2

Download the following files and place in the directory created in step 1.

Download the OWL file for Ractome, use Homo_sapiens (from Reactome, BioPAX level 3) and name it RXM.owl
Download the OWL file for PhosphoSitePlus (from PhosphoSitePlus, BioPAX:Kinase-substrate information) and name it PSP.owl
Download the file for the full version of PhosphoSitePlus (from PhosphoSitePlus, Phosphorylation_site_dataset) and name it PSP_full.tsv
Download the gmt file for WikiPathways (from WikiPathways, Homo_sapiens Gene lists per pathway(GMT)) and name it WP.tsv
Download the flat files for HPRD (from INDRA Flat files) and rename HPRD_ID_MAPPINGS.txt to HPRD_UIDs.tsv and rename POST_TRANSLATIONAL_MODIFICATIONS.txt to HPRD_mods.tsv
Download the flat files for BioGRID (from BioGRID Organism tab3) unzip and rename BIOGRID-ORGANISM-Homo_sapiens-4.2.193.tab3.txt to BG_UIDs.tsv and unzip and rename BIOGRID-PTMS-4.3.194.ptm.zip to BG_mods.tsv
Download the proteins (UniProt ids) for SIGNOR (from SIGNOR API) and rename to SIGNOR_UIDs.tsv
Download the phosphorylations from SIGNOR (from SIGNOR Homo sapiens Phosphorylation Data) and rename to SIGNOR_mods.tsv
Download the ppi from IMEX (from IMEX intact-micluster.txt from psi-mitab) and name it IMEX.tsv
Request the dataset from qPhos and name the file QPHOS_DATA.tsv
Copy the qPhos supplementary data file from the 'about' page in the qPhos website and name the file QPHOS_SUPP_DATA.tsv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Db_Compare

Requirements

Usage

Databases for step 2

Files

README.md

Latest commit

History

README.md

File metadata and controls

Db_Compare

Requirements

Usage

Databases for step 2