Skip to content

Latest commit

 

History

History
73 lines (55 loc) · 3.49 KB

README.md

File metadata and controls

73 lines (55 loc) · 3.49 KB

Db_Compare

NOTE: only works on Mac or Linux OS

Db_Compare Compares the databases Reactome, KEGG, HPRD, WikiPathways and PhosphoSitePlus on their coverage of phospho/proteomic data.

Requirements

The following conda command will create an environment called DbCompareConda with all dependencies installed

conda create --name DbCompareConda \
  --channel conda-forge \
  --channel bioconda \
  python=3 \
  pandas \
  requests \
  urllib3 \
  tqdm \
  r=3.6 \
  r-upsetr \
  r-sna \
  r-plotrix \
  r-ggplot2 \
  bioconductor-clusterprofiler \
  bioconductor-org.hs.eg.db \
  pandoc \
  openjdk=8

Usage

To run Db_Compare follow these steps:

  1. Clone this repo (as below) or create a new directory and place the provided scripts, files and folders inside.
git clone https://github.com/HannahHuckstep/Db_Compare.git
  1. [OPTIONAL] download the most current version of databases into Db_Compare/. By default Db_Compare will use the database file in refDatabaseFiles.zip and sigDatabaseFiles.zip. See the bullet points further down the page for names and locations of databases.

  2. Then run the bash script after navigating to the Db_Compare/ directory.

conda activate DbCompareConda
cd Db_Compare/
bash dbCompareScript.bash
  1. Once the analysis has completed open dbCompareNotebook.html to view the resulting plots. The consistency analysis results can be found in each's database directory (For Reactome, PhosphoSitePlus, HPRD, and qPhos)

Databases for step 2

Download the following files and place in the directory created in step 1.

  • Download the OWL file for Ractome, use Homo_sapiens (from Reactome, BioPAX level 3) and name it RXM.owl
  • Download the OWL file for PhosphoSitePlus (from PhosphoSitePlus, BioPAX:Kinase-substrate information) and name it PSP.owl
  • Download the file for the full version of PhosphoSitePlus (from PhosphoSitePlus, Phosphorylation_site_dataset) and name it PSP_full.tsv
  • Download the gmt file for WikiPathways (from WikiPathways, Homo_sapiens Gene lists per pathway(GMT)) and name it WP.tsv
  • Download the flat files for HPRD (from INDRA Flat files) and rename HPRD_ID_MAPPINGS.txt to HPRD_UIDs.tsv and rename POST_TRANSLATIONAL_MODIFICATIONS.txt to HPRD_mods.tsv
  • Download the flat files for BioGRID (from BioGRID Organism tab3) unzip and rename BIOGRID-ORGANISM-Homo_sapiens-4.2.193.tab3.txt to BG_UIDs.tsv and unzip and rename BIOGRID-PTMS-4.3.194.ptm.zip to BG_mods.tsv
  • Download the proteins (UniProt ids) for SIGNOR (from SIGNOR API) and rename to SIGNOR_UIDs.tsv
  • Download the phosphorylations from SIGNOR (from SIGNOR Homo sapiens Phosphorylation Data) and rename to SIGNOR_mods.tsv
  • Download the ppi from IMEX (from IMEX intact-micluster.txt from psi-mitab) and name it IMEX.tsv
  • Request the dataset from qPhos and name the file QPHOS_DATA.tsv
  • Copy the qPhos supplementary data file from the 'about' page in the qPhos website and name the file QPHOS_SUPP_DATA.tsv