This is the Knowledge Regression for Genomics (KnowEnG), an NIH BD2K Center of Excellence, Phenotype Prediction Pipeline that will be used to infer an 'omic'-drug association.
The user will need to do of the following in order for the system to learn how to predict the phenotype through regression:
- User must submit an ‘omic’ spreadsheet with samples as columns and genes as rows.
- User will also need to submit a phenotype value for each sample.
This will allow one to:
- Identify the best drug for a patient.
Given an omic spreadsheet of a collection of genes as well as the supplied phenotype value, the user will need to choose one of these options:
Options | Method | Parameters |
---|---|---|
Elastic Net | Elastic | elastic_net |
Lasso | Lasso | Lasso |
git clone https://github.com/KnowEnG/Phenotype_Prediction_Pipeline.git
apt-get install -y python3-pip
apt-get install -y libblas-dev liblapack-dev libatlas-base-dev gfortran
pip3 install numpy==1.11.1
pip3 install pandas==0.18.1
pip3 install scipy==0.18.0
pip3 install scikit-learn==0.17.1
apt-get install -y libfreetype6-dev libxft-dev
pip3 install matplotlib=1.4.2
pip3 install pyyaml
pip3 install knpackage
cd Phenotype_Prediction_Pipeline
cd test
make env_setup
- Run Elastic Net pipeline
make run_elastic_net
- Run Lasso pipeline
make run_lasso
Follow steps 1-3 above then do the following:
mkdir results_directory
Look for examples of run_parameters in:
Phenotype_Prediction_Pipeline/data/run_files BENCHMARK_2_ElasticNet.yml
Phenotype_Prediction_Pipeline/data/run_files BENCHMARK_1_PPP_Lasso.yml
Using Elastic net
python3 ../src/phenotype.prediction.py -run_directory ./run_dir -run_file BENCHMARK_2_ElasticNet.yml
Using Lasso
python3 ../src/phenotype.prediction.py -run_directory ./run_dir -run_file BENCHMARK_1_PPP_Lasso.yml
Key | Value | Comments |
---|---|---|
Elastic Net | Method | http://scikit-learn.org/stable/modules/linear_model.html#elastic-net |
Lasso | Method | http://scikit-learn.org/stable/modules/linear_model.html#lasso |
results_directory | Directory | Directory to save the output files |
spreadsheet_name = features_train_clean.df
response_name = response_train_clean.df
test_spreadsheet_name = features_test_clean.df
Gene Name | Prediction |
---|---|
User Gene 1 | Float |
... | ... |
User Gene n | Float |