Adding pathogen-host interactions data from PHI-base as xrefs to fungal, oomycete and bacterial pathogen genes
A registry file with the locations of the core database server(s) and the production database (or -production_db $PROD_DB_URL
specified).
PHI-base csv file (see below).
init_pipeline.pl Bio::EnsEMBL::EGPipeline::PipeConfig::FindPHIBaseCandidates_conf \
$($CMD details script) \
-hive_force_init 1\
-registry $REG_FILE \
-blast_db_dir $BLAST_DB_DIRECTORY \
-input_file $INPUT_ROTHAMSTED_FILE \
${OTHER_OPTIONS} \
2> $OUT_DIR/init.stderr \
1> $OUT_DIR/init.stdout
SYNC_CMD=$(cat $OUT_DIR/init.stdout | grep -- -sync'$' | perl -pe 's/^\s*//; s/"//g')
# should get something like
# beekeeper.pl -url $url -sync
LOOP_CMD=$(cat $OUT_DIR/init.stdout | grep -- -loop | perl -pe 's/^\s*//; s/\s*#.*$//; s/"//g')
# should get something like
# beekeeper.pl -url $url -reg_file $REG_FILE -loop
$SYNC_CMD 2> $OUT_DIR/sync.stderr 1> $OUT_DIR/sync.stdout
$LOOP_CMD 2> $OUT_DIR/loop.stderr 1> $OUT_DIR/loop.stdout
option | default value | meaning |
---|---|---|
-species |
species to process, several -species options are possible |
|
-input_file |
path to the prepared PHI-base current snapshot | |
-blast_db_dir |
cache with species-specific (generated from cores) peptide files | |
-skip_blast_update |
0 | skip verifying and updating cached BLAST DBs; 1 -- to skip |
-write |
0 | remove old and update with new xrefs; 1 -- to actually run (recommended), 0 -- for debugging |
-phi_release |
PHI_base release version (i.e. 4.6 ) |
N.B. The pipeline does not support collections at the moment
The pipeline works with slightly modified version of phi-base_current.csv file.
We need a header and unused colums to be removed from the file.
A typical flow for creating such file is:
- Get the raw phi-base_current.csv file
wget -c https://github.com/PHI-base/data/raw/master/releases/phi-base_current.csv
- Remove headers and unused columns using helper script.
# TODO
> phi_base.filtered.csv
- Normalise ontology terms, using phi_ontology/phi-base_ontologies.pl script and ontology maps within phi_ontology dir.
PHIBO_DIR=${ENS_ROOT_DIR}/ensembl-production-imported/scripts/phi_ontology
cp phi_base.filtered.csv phi_base.filtered.csv.orig # creating a back up
perl ${PHIBO_DIR}/phi-base_ontologies.pl \
-phenotype_dictionary ${PHIBO_DIR}/ontologies_phenotypes.pl \
-conditions_dictionary ${PHIBO_DIR}/ontologies_conditions.pl \
-input_file phi_base.filtered.csv \
-obo_file_destination PHIbase_ontology.${VERSION}.obo
Params:
-phenotype_dictionary
-- Perl file with phenotype (%mapped_phenotypes
) dictionary-conditions_dictionary
-- Perl file with conditions (%mapped_conditions
) dictionary-input_file
-- previous stage result (no header, subset of columns); input file is replaced with the normalised vesrion-obo_file_destination
-- optional arg to store .obo file
If the script can't solve all the entries in the file, it outputs the list of terms that need to be added to the dictionaries in phi_ontology dir.
There are few things to be aware of when running this script:
- The sript replaces the
input_file
with the normalized version. Please, have a copy if you need one. - The script will create (and remove on success)
Ontologies_temp_files
temporary directory, in the directory, the-input_file
is taken from. Thus this recent directory needs to be writable.
- Now you can use
phi_base.filtered.csv
to initialize-input_file
option of your pipeline.
A few generic from Common::RunnableDB.
A few from PHIBase.