Skip to content

Automatically constructing corpus for automatic speech recognition from YouTube videos

License

Notifications You must be signed in to change notification settings

nefastosaturo/KTSpeechCrawler

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

KT-Speech-Crawler: Automatic Dataset Construction for Speech Recognition from YouTube Videos

Google Colab

https://colab.research.google.com/drive/1JVKzB9N2FIcxlib1kXuGlfeIuudkM9Vr

Installation

git clone https://github.com/EgorLakomkin/KTSpeechCrawler
pip install -r requirements.txt

Running crawler

chmod a+x ./crawler/en_corpus.sh
./crawler/en_corpus.sh <dir_with_intermediate_results> <dir_for_resulting_samples>

Browsing samples

python server.py --corpus <dir_for_resulting_samples>
Goto: http://localhost:8888/

Citation

@article{lakomkin2018kt, title={KT-Speech-Crawler: Automatic Dataset Construction for Speech Recognition from YouTube Videos}, author={Lakomkin, Egor and Magg, Sven and Weber, Cornelius and Wermter, Stefan}, journal={EMNLP 2018}, pages={90}, year={2018} }

About

Automatically constructing corpus for automatic speech recognition from YouTube videos

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 75.9%
  • HTML 20.9%
  • Shell 3.2%