This is an experimental deepspeech model for the Czech language. The model is under the CC-BY-NC license. Datasets used are:
- Vystadial 2016 – Czech data by Plátek, Ondřej ; Dušek, Ondřej ; Jurčíček, Filip (CC-BY-SA 4.0)
- OVM – Otázky Václava Moravce by Šmídl, Luboš ; Pražák, Aleš (CC-BY-NC 3.0)
- Czech Parliament Meetings by Pražák, Aleš ; Šmídl, Luboš (CC-BY-NC-ND 3.0)
- Large Corpus of Czech Parliament Plenary Hearings by Kratochvíl, Jonáš ; Polák, Peter ; Bojar, Ondřej (CC-BY 4.0)
- Common Voice Czech by Mozilla (CC0)
- Some private recordings and parts of audioboooks
The model has been originally transfer-learned from the English Deepspeech/Coqui model version 0.9.3.
Released scorers have been created using the CWC 2011 Corpus by Spoustová, Johanka and Spousta, Miroslav (CC-BY 3.0) as well as Wikipedia XML dump, Czech part of Europarl v7, public domain e-books from Municipal Library of Prague and transcriptions of the training data.