We are publishing pre-trained word vectors for 90 languages, trained on Wikipedia using fastText. These vectors in dimension 300 were obtained using the skip-gram model described in Bojanowski et al. (2016) with default parameters.
The word vectors come in both the binary and text default formats of fastText. In the text format, each line contain a word followed by its embedding. Each value is space separated. Words are ordered by their frequency in a descending order.
The pre-trained word vectors are distributed under the Creative Commons Attribution-Share-Alike License 3.0.
If you use these word embeddings, please cite the following paper:
P. Bojanowski*, E. Grave*, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information
@article{bojanowski2016enriching,
title={Enriching Word Vectors with Subword Information},
author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
journal={arXiv preprint arXiv:1607.04606},
year={2016}
}
The models can be downloaded from:
- Afrikaans: bin+text, text
- Albanian: bin+text, text
- Arabic: bin+text, text
- Armenian: bin+text, text
- Asturian: bin+text, text
- Azerbaijani: bin+text, text
- Bashkir: bin+text, text
- Basque: bin+text, text
- Belarusian: bin+text, text
- Bengali: bin+text, text
- Bosnian: bin+text, text
- Breton: bin+text, text
- Bulgarian: bin+text, text
- Burmese: bin+text, text
- Catalan: bin+text, text
- Cebuano: bin+text, text
- Chechen: bin+text, text
- Chinese: bin+text, text
- Chuvash: bin+text, text
- Croatian: bin+text, text
- Czech: bin+text, text
- Danish: bin+text, text
- Dutch: bin+text, text
- English: bin+text, text
- Esperanto: bin+text, text
- Estonian: bin+text, text
- Farsi: bin+text, text
- Finnish: bin+text, text
- French: bin+text, text
- Galician: bin+text, text
- Georgian: bin+text, text
- German: bin+text, text
- Greek: bin+text, text
- Gujarati: bin+text, text
- Hebrew: bin+text, text
- Hindi: bin+text, text
- Hungarian: bin+text, text
- Icelandic: bin+text, text
- Indonesian: bin+text, text
- Italian: bin+text, text
- Japanese: bin+text, text
- Kannada: bin+text, text
- Kazakh: bin+text, text
- Khmer: bin+text, text
- Korean: bin+text, text
- Kyrgyz: bin+text, text
- Latin: bin+text, text
- Latvian: bin+text, text
- Lithuanian: bin+text, text
- Luxembourgish: bin+text, text
- Macedonian: bin+text, text
- Malagasy: bin+text, text
- Malayalam: bin+text, text
- Malay: bin+text, text
- Marathi: bin+text, text
- Minangkabau: bin+text, text
- Mongolian: bin+text, text
- Nepali: bin+text, text
- Newar: bin+text, text
- Norwegian: bin+text, text
- Occitan: bin+text, text
- Polish: bin+text, text
- Portuguese: bin+text, text
- Punjabi: bin+text, text
- Romanian: bin+text, text
- Russian: bin+text, text
- Sanskrit: bin+text, text
- Scots: bin+text, text
- Serbian: bin+text, text
- Serbo-Croatian: bin+text, text
- Sinhalese: bin+text, text
- Slovak: bin+text, text
- Slovene: bin+text, text
- Spanish: bin+text, text
- Swedish: bin+text, text
- Tagalog: bin+text, text
- Tajik: bin+text, text
- Tamil: bin+text, text
- Tatar: bin+text, text
- Telugu: bin+text, text
- Thai: bin+text, text
- Turkish: bin+text, text
- Ukrainian: bin+text, text
- Urdu: bin+text, text
- Uzbek: bin+text, text
- Vietnamese: bin+text, text
- Volapük: bin+text, text
- Waray: bin+text, text
- Welsh: bin+text, text
- Western Frisian: bin+text, text