This repository contains data and scripts for reproducing evaluation results presented in Accurately modeling biased random walks on weighted networks using node2vec+. Node2vec+ is implemented as an extension to PecanPy, a fast and memory efficient implementation of node2vec.
Follow the scripts below to execute full evaluation provaided in this repository. For more details, check out the sections below.
PROCEED WITH CAUTION: the full evaluation consumes significant amount of space and computational resources (via SLURM)
# Set up conda environment
source config.sh setup
# Download and set up gene interaction network data
source config.sh download_ppis
# Submit all evaluation jobs
sh submit_all.sh
After all evaluation jobs are finished successfully, open the jupyter notebooks in plot/
and generate evaluation plots.
We provide a simple script to set up the conda environemnt node2vecplus-bench
:
source config.sh setup
To remove the environment, simply run
source config.sh cleanup
Alternatively, user can set up the environment manually instead of using the config.sh
script.
Additionally all the required dependencies can be found in requirements.txt
.
-
Step1. Set up node2vecpluc-bench conda environment with Python 3.8
conda create -n node2vecplus-bench python=3.8 && conda activate node2vecplus-bench
-
Step2. Set up PyTorch related packages with CUDA 10.2 (checkout the PyTorch website for other CUDA/CPU installation options)
conda install pytorch=1.9 torchvision cudatoolkit=10.2 -c pytorch -y pip install torch-geometric==2.0.0 torch-scatter torch-sparse torch-cluster -f https://data.pyg.org/whl/torch-1.9.0+cu102.html
-
Step3. Install rest of the depencies for reproducing experiemnts
pip install -r requirements.txt
- Hierarchical cluster graphs
- Standard benchmarking datasets
BlogCatalog
Wikipedia
- Human gene interaction networks (need to download, see below)
STRING
HumanBase*
GTExCoExp*
The hierarchical cluster graphs are constructed by taking RBF of point coulds generated in the Euclidean space, and hence each graph natually exhibits a hierarchical community structure (more info in the supplementary materials of the paper). Each network is assocaited with two tasks, cluster classification and level classification.
The BlogCatalog and Wikipedia networks, along with the associated node labels, are obtained from SNAP-node2vec. The networks are processed by removing isolated nodes and converting to edge list tsv files.
source config.sh download_ppis
Under the root directory of the repository, download gene interaction networks from Zenodo
curl -o node2vecplus_bench_ppis.tar.gz https://zenodo.org/record/7007164/files/node2vecplus_bench_ppis.tar.gz
(Recommended) Although Zenodo provide a nice feature for versioning datasets with DOI, downloading could be a bit slow. Thus, we provide an alternative download option from Dropbox. The file should be in sync with the latest dataset version on Zenodo.
curl -L -o node2vecplus_bench_ppis.tar.gz https://www.dropbox.com/s/aettebq5lbgu1cu/node2vecplus_bench_ppis-v1.0.0.tar.gz?dl=1
After the zipped tar ball is downloaded, extract and place them under data/networks
by
tar -xzvf node2vecplus_bench_ppis.tar.gz --transform 's/node2vecplus_bench_ppis/ppi/' --directory data/networks
This repository contains the following scripts for reproducing the evaluation results
eval_hcluster.py
- evaluate performnace of node2vec(+) using hierarchical cluster graphseval_realworld_networks.py
- evaluate performance of node2vec(+) using commonly benchmarked real-world datasets BlogCatalog and Wikipediaeval_gene_classification_n2v.py
- evalute performance of node2vec(+) for gene classification tasks using gene interaction networkseval_gene_classification_gnn.py
- evaluate performance of GNNs for gene classification tasks using gene interaction networks
Each one of the above scripts can be run from command line, e.g.
cd script
# example of evaluating K3L2 hierarchical cluster graph using node2vec with q=10
python evalu_hcluster.py --network K3L2 --q 10 --nooutput
# sample as above but using node2vec+
python evalu_hcluster.py --netwokr K3L2 --q 10 --nooutput --extend
# check other commandline keyward options
python eval_hcluster.py --help
If --nooutput
is not specified, then the evaluation results are saved to result/
as .csv
.
Alternatively, one can submit evaluation jobs using
cd slurm
# submit all evaluations on hierarchical cluster graphs
sbatch eval_hcluster_all.sb
# submit all evaluations for BlogCatalog and Wikipedia
sbatch eval_realworld_networks.sb
# submit all evaluations for gene classifications using node2vec+
sbatch eval_gene_classification_n2vplus.sb
# submit all evaluations for gene classifications using node2vec
sbatch eval_gene_classification_n2v.sb
# submit all evaluations for gene classifications using GNNs
sbatch eval_gene_classification_gnn.sb
# submit all evaluations for tissue-specific gene classifications using node2vec+
sbatch eval_tissue_gene_classification_n2vplus.sb
# submit all evaluations for tissue-specific gene classifications using node2vec
sbatch eval_tissue_gene_classification_n2v.sb
Or submitting all evaluations above by simply running
sh submit_all.sh
Note: depending on the your preference you can modify the nodes requirement in submit_all.sh
for individual jobs script.
First, tune the architecture of GNN (hidden dimension, number of layers, residual connection)
cd gnn_tuning
sh tune_gnn_architecture.sb
Then, fix the best architecture and tune the rest of the training parameters (learning rate, dropout rate, weight decay)
cd gnn_tuning
sh tune_gnn_params.sb
To aggregate the gnn tuning results, use aggregate_tuning_results.py
:
python gnn_tuning/aggregate_tuning_results.py
Finally, use the GNN tuning notebook to analyze the results and find the optimal GNN configurations.
Example test commands
python eval_gene_classification_n2v.py --gene_universe HBGTX --network HumanBaseTop-global --p 1 --q 1 --nooutput --test
Install additional dev dependencies
pip install -r requirements-dev.txt
Once the network data are set up and placed under data/networks/ppi
, run
process_labels.py
- Make new dataset version on zenodo and upload corresponding file
- Upload file to dropbox for alternative download option
- Update README (Zenodo DOI, Zenodo link, Dropbox link)
- Update
config.sh
Dropbox link
If you find this work useful, please consider citing our paper:
@article {liu2022node2vecplus,
title = {Accurately modeling biased random walks on weighted networks using node2vec+},
author = {Liu, Renming and Hirn, Matthew and Krishnan, Arjun},
year = {2022},
doi = {10.1101/2022.08.14.503926},
publisher = {Cold Spring Harbor Laboratory},
journal = {bioRxiv}
URL = {https://www.biorxiv.org/content/early/2022/08/15/2022.08.14.503926},
eprint = {https://www.biorxiv.org/content/early/2022/08/15/2022.08.14.503926.full.pdf},
}