Skip to content

Latest commit

 

History

History
127 lines (90 loc) · 4.07 KB

reproduce_arxiv.md

File metadata and controls

127 lines (90 loc) · 4.07 KB

Docker Build Status DOI

Environment & Data

git checkout 3543e65287e51a42d7abf9fecaf6f4f881743475

# Set up environment
pip install virtualenv
virtualenv -p python3.5 birch_env
source birch_env/bin/activate

# Install dependencies
pip install Cython  # jnius dependency
pip install -r requirements.txt

git clone https://github.com/NVIDIA/apex
cd apex && pip install -v --no-cache-dir . && cd ..

# Set up Anserini
git clone https://github.com/castorini/anserini.git
cd anserini && mvn clean package appassembler:assemble
cd eval && tar xvfz trec_eval.9.0.4.tar.gz && cd trec_eval.9.0.4 && make && cd ../../..

# Download data and models
wget https://zenodo.org/record/3269890/files/birch_data.tar.gz
tar -xzvf birch_data.tar.gz

Dataset

python src/robust04_cv.py --anserini_path <path/to/anserini> --index_path <path/to/index> --cv_fold <2, 5>

This step retrieves documents to depth 1000 for each query, and splits them into sentences to generate folds data. You may skip to the next step and and use the downloaded data under data/datasets.

Training

python src/main.py --mode training --collection mb --qrels_file qrels.microblog.txt --batch_size <batch_size> --eval_steps <eval_steps> --learning_rate <learning_rate> --num_train_epochs <num_train_epochs> --device cuda

Inference

python src/main.py --mode inference --experiment <qa_2cv, mb_2cv, qa_5cv, mb_5cv> --collection <robust04_2cv, robust04_5cv> --model_path <models/saved.mb_3, models/saved.qa_2> --load_trained --batch_size <batch_size> --device cuda

Note that this step takes a long time. If you don't want to evaluate the pretrained models, you may skip to the next step and evaluate with our predictions under data/predictions.

Evaluation

BM25+RM3 Baseline

./eval_scripts/baseline.sh <path/to/anserini> <path/to/index> <2, 5>

Sentence Evidence

  • Compute document score

Set the last argument to True if you want to tune the hyperparameters first. To use the default hyperparameters, set to False.

./eval_scripts/test.sh <qa_2cv, mb_2cv, qa_5cv, mb_5cv> <2, 5> <path/to/anserini> <True, False>
  • Evaluate with trec_eval
./eval_scripts/eval.sh <bm25+rm3_2cv, qa_2cv, mb_2cv, bm25+rm3_5cv, qa_5cv, mb_5cv> <path/to/anserini> qrels.robust2004.txt

Result on Robust04

  • "Paper 1" based on two-fold CV:
Model AP P@20
Paper 1 (two fold) 0.2971 0.3948
BM25+RM3 (Anserini) 0.2987 0.3871
1S: BERT(QA) 0.3014 0.3928
2S: BERT(QA) 0.3003 0.3948
3S: BERT(QA) 0.3003 0.3948
1S: BERT(MB) 0.3241 0.4217
2S: BERT(MB) 0.3240 0.4209
3S: BERT(MB) 0.3244 0.4219
  • "Paper 2" based on five-fold CV:
Model AP P@20
Paper 2 (five fold) 0.272 0.386
BM25+RM3 (Anserini) 0.3033 0.3974
1S: BERT(QA) 0.3102 0.4068
2S: BERT(QA) 0.3090 0.4064
3S: BERT(QA) 0.3090 0.4064
1S: BERT(MB) 0.3266 0.4245
2S: BERT(MB) 0.3278 0.4267
3S: BERT(MB) 0.3278 0.4287

See this paper for the exact fold settings.

Replication Log


How do I cite this work?

@article{yang2019simple,
  title={Simple Applications of BERT for Ad Hoc Document Retrieval},
  author={Yang, Wei and Zhang, Haotian and Lin, Jimmy},
  journal={arXiv preprint arXiv:1903.10972},
  year={2019}
}