Skip to content

nttcslab-sp/espnet-semi-supervised

This branch is 2 commits behind ShigekiKarita/espnet-semi-supervised:master.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
karita
Sep 5, 2018
d4aa317 · Sep 5, 2018

History

2 Commits
Aug 30, 2018
Aug 30, 2018
Aug 30, 2018
Aug 30, 2018
Aug 30, 2018
Sep 5, 2018
Aug 30, 2018
Aug 30, 2018
Aug 30, 2018
Aug 30, 2018
Aug 30, 2018
Aug 30, 2018
Aug 30, 2018
Aug 30, 2018
Aug 30, 2018

Repository files navigation

ESPnet extensions for semi-supervised end-to-end speech recognition

This repository contains evaluation scripts used in our paper

@inproceedings{Karita2018,
  author={Shigeki Karita and Shinji Watanabe and Tomoharu Iwata and Atsunori Ogawa and Marc Delcroix},
  title={Semi-Supervised End-to-End Speech Recognition},
  year=2018,
  booktitle={Proc. Interspeech 2018},
  pages={2--6},
  doi={10.21437/Interspeech.2018-1746},
  url={http://dx.doi.org/10.21437/Interspeech.2018-1746}
}

Full PDF is available in https://www.isca-speech.org/archive/Interspeech_2018/abstracts/1746.html.

how to setup

$ git clone https://github.com/nttcslab-sp/espnet-semisupervised --recursive
$ cd espnet-semisupervised/espnet/tools; make PYTHON_VERSION=3 -f conda.mk
$ cd ../..
$ ./run.sh --gpu 0 --wsj0 <your-wsj0-path> --wsj1 <your-wsj1-path>

NOTE: you need to install pytorch 0.3.1.

scripts

in root dir

  • run.sh : end-to-end recipe for this experiment (do not forget to set –gpu 0 if you have that)
  • sbatch.sh : slurm job script for sevaral pair/unpair data ratio and hyper parameter search (requires finished run_retrain_wsj.sh expdir for pretrained model params)

in shell/ dir

  • show_results.sh : summarize CER/WER/SER from decoded results of dev93/test92 sets (usage: `show_results.sh exp/train_si84_xxx`)
  • decode.sh : a script for decode and evaluate training model (usage: `decode.sh –expdir exp/train_si84_xxx`)
  • debug.sh : we recommend to source debug.sh before using ipython to set path to everything you need

in python/ dir

  • asr_train_loop_th.py : is a python script for initial-training with the paired dataset (train_si84)
  • retrain_loop_th.py : is a python script for re-training with the unpaired dataset (train_si284)
  • unsupervised_recog_th.py : is a python script for decoding by the re-trained model
  • unsupervised.py : implements pytorch model for paired/unpaired learning
  • results.py : implements chainer like reporter without chainer iterator used in training loop

results

train_setdev93 Accdev93 CEReval92 CERdev93 WEReval92 WERdev93 SEReval92 SERpath
train_si84 (7138, 15 hours)77.625.415.861.944.299.898.5exp/train_si84_blstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150
+ train_si284 RNNLM19.316.651.347.799.899.7exp/rnnlm_train_si84_blstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150_epochs15
+ unpaired train_si284 retrain83.828.215.661.240.599.697.6./exp/train_si84_retrain_None_alpha0.5_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.9
+ RNNLM22.117.251.644.299.099.4./exp/train_si84_retrain_None_alpha0.5_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.9/rnnlm0.1
+ unpaired train_si284 retrain w/ GAN-si8483.526.315.059.940.099.497.3exp/train_si84_paired_hidden_gan_alpha0.5_bnFalse_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.9_st0.5_train_si84_epochs15
+ unpaired train_si284 retrain w/ KL-si8483.628.515.660.540.499.697.3exp/train_si84_paired_hidden_gausslogdet_alpha0.5_bnFalse_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.9_st0.9_train_si84_epochs15
+ unpaired train_si284 retrain w/ GAN84.222.117.950.944.299.299.4./exp/train_si84_retrain84_gan_alpha0.5_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.9_st0.9_train_si84_iter5
+ RNNLM22.117.950.944.299.299.4./exp/train_si84_retrain84_gan_alpha0.5_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.9_st0.9_train_si84_iter5/rnnlm0.2
+ unpaired train_si284 retrain w/ KL84.024.814.458.139.599.696.4./exp/train_si84_ret3_gausslogdet_alpha0.5_bnFalse_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.9_st0.5_train_si84_epochs30
+ RNNLM20.016.948.942.799.099.1./exp/train_si84_retrain84_gausslogdet_alpha0.5_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.99_st0.99_train_si84/rnnlm0.2
+ unpaired train_si284 retrain w/ MMD82.925.913.959.738.499.296.7./exp/train_si84_ret3_mmd_alpha0.5_bnFalse_adadelta_lr1.0_bs30_el6_dl1_att_location_batch30_data_loss0.5_st0.99_train_si84_epochs30
train_si284 (37416 utt, 81 hours)93.98.16.323.818.992.487.4exp/train_si284_blstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150
+ train_si284 RNNLM7.96.122.718.389.784.1./exp/rnnlm_train_si284_blstmp_e6_subsample1_2_2_1_1_unit320_proj320_d1_unit300_location_aconvc10_aconvf100_mtlalpha0.5_adadelta_bs30_mli800_mlo150_epochs15
  • Acc: character accuracy during training with forced decoding
  • CER: character error rate (edit distance based error)
  • WER: word error rate (edit distance based error)
  • SER: sentence error rate (exact match error)
  • all the exp path starts with exp/... is placed to /nfs/kswork/kishin/karita/experiments/espnet-unspervised/egs/wsj/unsupervised on NTT ks-servers

smaller paired train data results

plot.png

contact

email: karita.shigeki@lab.ntt.co.jp

Releases

No releases published

Packages

No packages published

Languages

  • Python 71.1%
  • Shell 28.9%