- Feature info: using fbank feature, cmvn, dither, online speed perturb
- Training info: train_conformer_bidecoder_large.yaml, kernel size 31, lr 0.002, batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 1.0
- Decoding info: ctc_weight 0.3, reverse weight 0.5, average_num 30
- Git hash: 65270043fc8c2476d1ab95e7c39f730017a670e0
- Model link: http://mobvoi-speech-public.ufile.ucloud.cn/public/wenet/librispeech/20210610_conformer_bidecoder_exp.tar.gz
- LM-tgmed: 3-gram.pruned.1e-7.arpa.gz
- LM-tglarge: 3-gram.arpa.gz
- LM-fglarge: 4-gram.arpa.gz
decoding mode | test clean | test other |
---|---|---|
ctc prefix beam search | 2.96 | 7.14 |
attention rescoring | 2.66 | 6.53 |
LM-tgmed + attention rescoring | 2.78 | 6.32 |
LM-tglarge + attention rescoring | 2.68 | 6.10 |
LM-fglarge + attention rescoring | 2.65 | 5.98 |
- Feature info: using fbank feature, cmvn, dither, online speed perturb
- Training info: train_conformer.yaml, kernel size 31, lr 0.004, batch size 12, 8 gpu, acc_grad 4, 120 epochs, dither 0.1
- Decoding info: ctc_weight 0.5, average_num 30
- Git hash: 90d9a559840e765e82119ab72a11a1f7c1a01b78
- Model link: http://mobvoi-speech-public.ufile.ucloud.cn/public/wenet/librispeech/20210216_conformer_exp.tar.gz
- LM-fglarge: 4-gram.arpa.gz
decoding mode | test clean | test other |
---|---|---|
ctc greedy search | 3.51 | 9.57 |
ctc prefix beam search | 3.51 | 9.56 |
attention decoder | 3.05 | 8.36 |
attention rescoring | 3.18 | 8.72 |
attention rescoring (beam 50) | 3.12 | 8.55 |
LM-fglarge + attention rescoring | 3.09 | 7.40 |
- Feature info: using fbank feature, cmvn, no speed perturb, dither
- Training info: train_u2++_conformer.yaml lr 0.001, batch size 24, 8 gpu, acc_grad 1, 120 epochs, dither 1.0
- Decoding info: ctc_weight 0.3, reverse weight 0.5, average_num 30
- Git hash: 65270043fc8c2476d1ab95e7c39f730017a670e0
- Model link: http://mobvoi-speech-public.ufile.ucloud.cn/public/wenet/librispeech/20210610_u2pp_conformer_exp.tar.gz
test clean
decoding mode | full | 16 |
---|---|---|
ctc prefix beam search | 3.76 | 4.54 |
attention rescoring | 3.32 | 3.80 |
test other
decoding mode | full | 16 |
---|---|---|
ctc prefix beam search | 9.50 | 11.52 |
attention rescoring | 8.67 | 10.38 |
- Feature info: using fbank feature, cmvn, speed perturb, dither
- Training info: train_unified_conformer.yaml lr 0.001, batch size 10, 8 gpu, acc_grad 1, 120 epochs, dither 1.0
- Decoding info: ctc_weight 0.5, average_num 30
- Git hash: 90d9a559840e765e82119ab72a11a1f7c1a01b78
- Model link: http://mobvoi-speech-public.ufile.ucloud.cn/public/wenet/librispeech/20210215_unified_conformer_exp.tar.gz
- LM-tgmed: 3-gram.pruned.1e-7.arpa.gz
- LM-tglarge: 3-gram.arpa.gz
- LM-fglarge: 4-gram.arpa.gz
test clean
decoding mode | full | 16 |
---|---|---|
ctc prefix beam search | 4.26 | 5.00 |
attention decoder | 3.05 | 3.44 |
attention rescoring | 3.72 | 4.10 |
attention rescoring (beam 50) | 3.57 | 3.95 |
LM-tgmed + attention rescoring | 3.56 | 4.02 |
LM-tglarge + attention rescoring | 3.40 | 3.82 |
LM-fglarge + attention rescoring | 3.38 | 3.74 |
test other
decoding mode | full | 16 |
---|---|---|
ctc prefix beam search | 10.87 | 12.87 |
attention decoder | 9.07 | 10.44 |
attention rescoring | 9.74 | 11.61 |
attention rescoring (beam 50) | 9.34 | 11.13 |
LM-tgmed + attention rescoring | 8.78 | 10.26 |
LM-tglarge + attention rescoring | 8.34 | 9.74 |
LM-fglarge + attention rescoring | 8.17 | 9.44 |