This repository makes available a collection of wordlists derived from article titles in various language Wikipedias. The data has been extracted from Wikidata.
The data directory contains subdirectories arranged in order of ISO language code.
The basic filename pattern is [ISO]-wordlist_wiki.txt
, with [ISO]
being the target language ISO code. A list of all available languages is below.
Language code | Language name |
---|---|
af |
Afrikaans |
am |
Amharic |
ang |
Anglo-Saxon |
ar |
Arabic |
arc |
Aramaic |
bg |
Bulgarian |
bi |
Bislama |
bn |
Bengali |
bo |
Tibetan |
br |
Breton |
bs |
Bosnian |
ca |
Catalan |
cdo |
Min Dong |
chr |
Cherokee |
chy |
Cheyenne |
cr |
Cree |
cs |
Czech |
cy |
Welsh |
da |
Danish |
de |
German |
el |
Greek |
en |
English |
eo |
Esperanto |
es |
Spanish |
et |
Estonian |
eu |
Basque |
fa |
Persian |
ff |
Fula |
fi |
Finnish |
fr |
French |
ga |
Irish |
gan |
Gan |
gd |
Scottish Gaelic |
gu |
Gujarati |
gv |
Manx |
ha |
Hausa |
hak |
Hakka |
haw |
Hawaiian |
he |
Hebrew |
hi |
Hindi |
hr |
Croatian |
ht |
Haitian |
hu |
Hungarian |
hy |
Armenian |
id |
Indonesian |
ig |
Igbo |
is |
Icelandic |
it |
Italian |
iu |
Inuktitut |
ja |
Japanese |
jbo |
Lojban |
jv |
Javanese |
ka |
Georgian |
kg |
Kongo |
ki |
Kikuyu |
kl |
Greenlandic |
km |
Khmer |
ko |
Korean |
la |
Latin |
lg |
Luganda |
lo |
Lao |
lt |
Lithuanian |
lv |
Latvian |
mg |
Malagasy |
mi |
Maori |
mn |
Mongolian |
ms |
Malay |
mt |
Maltese |
nah |
Nahuatl |
ne |
Nepali |
nl |
Dutch |
nn |
Norwegian (Nynorsk) |
no |
Norwegian |
nv |
Navajo |
ny |
Chichewa |
oc |
Occitan |
pa |
Punjabi |
pi |
Pali |
pl |
Polish |
ps |
Pashto |
pt |
Portuguese |
qu |
Quechua |
ro |
Romanian |
ru |
Russian |
sa |
Sanskrit |
se |
Northern Sami |
sh |
Serbo-Croatian |
sk |
Slovak |
sl |
Slovenian |
sn |
Shona |
so |
Somali |
sq |
Albanian |
sr |
Serbian |
sv |
Swedish |
sw |
Kiswahili |
ta |
Tamil |
te |
Telugu |
th |
Thai |
tl |
Tagalog |
tpi |
Tok Pisin |
tr |
Turkish |
ug |
Uyghur |
uk |
Ukrainian |
ur |
Urdu |
vi |
Vietnamese |
wo |
Wolof |
wuu |
Wu |
xh |
Xhosa |
yi |
Yiddish |
yo |
Yoruba |
za |
Zhuang |
zh |
Chinese (Mandarin) |
zh_classical |
Classical Chinese |
zh_min_nan |
Min Nan |
zh_yue |
Cantonese |
zu |
Zulu |
Language | # of entries |
---|---|
af |
33599 |
am |
11014 |
ang |
2977 |
ar |
446845 |
arc |
1829 |
bg |
225573 |
bi |
490 |
bn |
59121 |
bo |
2929 |
br |
49865 |
bs |
64229 |
ca |
438072 |
cdo |
2909 |
chr |
492 |
chy |
710 |
cr |
70 |
cs |
327321 |
cy |
52130 |
da |
196279 |
de |
1787961 |
el |
136650 |
en |
4798378 |
eo |
209308 |
es |
1346715 |
et |
124124 |
eu |
203027 |
fa |
744454 |
ff |
464 |
fi |
363265 |
fr |
1862431 |
ga |
35768 |
gan |
14253 |
gd |
15561 |
gu |
27615 |
gv |
4723 |
ha |
518 |
hak |
4123 |
haw |
2009 |
he |
209505 |
hi |
120411 |
hr |
139555 |
ht |
45669 |
hu |
323069 |
hy |
161719 |
id |
338477 |
ig |
1075 |
is |
39429 |
it |
1183116 |
iu |
383 |
ja |
951498 |
jbo |
1179 |
jv |
45722 |
ka |
118968 |
kg |
868 |
ki |
311 |
kl |
1839 |
km |
4713 |
ko |
446200 |
la |
111691 |
lg |
179 |
lo |
1913 |
lt |
173148 |
lv |
58016 |
mg |
77182 |
mi |
2579 |
mn |
18668 |
ms |
245936 |
mt |
2981 |
nah |
10519 |
ne |
24961 |
nl |
1812937 |
nn |
117294 |
no |
403749 |
nv |
3887 |
ny |
170 |
oc |
88788 |
pa |
14042 |
pi |
2759 |
pl |
1088821 |
ps |
5148 |
pt |
866567 |
qu |
18494 |
ro |
264609 |
ru |
1461243 |
sa |
12256 |
se |
7216 |
sh |
284238 |
sk |
269048 |
sl |
132095 |
sn |
1671 |
so |
2760 |
sq |
53553 |
sr |
351888 |
sv |
1954061 |
sw |
26694 |
ta |
80394 |
te |
63860 |
th |
134176 |
tl |
57983 |
tpi |
1336 |
tr |
247607 |
ug |
2596 |
uk |
638342 |
ur |
125182 |
vi |
1241500 |
wo |
1636 |
wuu |
5032 |
xh |
319 |
yi |
12575 |
yo |
35053 |
za |
808 |
zh |
804107 |
zh_classical |
3855 |
zh_min_nan |
14851 |
zh_yue |
32062 |
zu |
689 |
Language | # of entries |
---|---|
en |
4798378 |
sv |
1954061 |
fr |
1862431 |
nl |
1812937 |
de |
1787961 |
ru |
1461243 |
es |
1346715 |
vi |
1241500 |
it |
1183116 |
pl |
1088821 |
According to the Wikidata website:
All structured data from the main and property namespace is available under the Creative Commons CC0 License
The data in this repository is therefore made available under the same Creative Commons CC0 License as that used by the Wikidata project. All of the data has been derived from the Wikidata JSON format database dumps.