conda install pybind11
conda install six
conda install nltk
conda install pyramid
pip install boto3
pip install requests
pip install sentencepiece
install pytorch gpu version https://pytorch.org
install apex: https://github.com/NVIDIA/apex (https://github.com/NVIDIA/apex#linux)
install wikiextractor: https://github.com/attardi/wikiextractor.git
install dllogger https://github.com/NVIDIA/dllogger.git
Preprocess the dataset (optional, dataset is already preprocessed, named my-bert_text_sentence in the repo)
bash megatron18_bert_preprocess_data.sh
mpirun -np 1 megatron18_bert_pretrain_distributed.sh
Preprocess the dataset (optional, dataset is already preprocessed, named my-gpt2_text_sentence in the repo)
bash megatron18_bert_preprocess_data.sh
mpirun -np 1 megatron17_gpt2_pretrain_distributed.sh
wget http://www.cs.cmu.edu/~glai1/data/race/RACE.tar.gz
tar -zxf RACE.tar.gz
mv RACE data
bash megatron_bert_race_eval.sh
https://github.com/google-research/bert
pip install deepspeed
2. Presplit the dataset (optional, dataset is already presplitted, named data/wikipedia/wiki_AA_presplited.json)
python presplit_sentences_json.py /scratch2/xluo/program/sosp21_exp/data/wikipedia/wiki_AA.json /scratch2/xluo/program/sosp21_exp/data/wikipedia/wiki_AA_presplited.json
open the DeepSpeed/DeepSpeedExamples/Megatron-LM/data_utils/corpora.py
set PATH = 'data/wikipedia/wiki_AA_presplited.json'
apply the deepspeed.patch to DeepSpeedExample
mpirun -np 1 ./deepspeed_bert_pretrain_mp.sh
module load open-ce/1.1.3-py37-0
module load gcc/6.4.0
module load cuda/10.2
issue with pytorch 1.7 pytorch/pytorch#47138 nvbert: RuntimeError: default_program(57): error: identifier "aten_mul_flat__1" is undefined megatron-lm: AttributeError: module ‘torch’ has no attribute ‘amp_foreach_non_finite_check_and_unscale’
apex does not find c++ compiler
apply this patch for fp32 NVIDIA/Megatron-LM#36