Skip to content

Latest commit

 

History

History
1516 lines (1516 loc) · 56.1 KB

ACCURACY_TABLE.md

File metadata and controls

1516 lines (1516 loc) · 56.1 KB
Language Average Single Words Word Pairs Sentences
Lingua   Tika   OpenNLP Optimaize Lingua   Tika   OpenNLP Optimaize Lingua   Tika   OpenNLP Optimaize Lingua   Tika   OpenNLP Optimaize
Afrikaans 79 71 72 39 58 44 41 3 81 70 75 22 97 98 99 93
Albanian 88 79 71 70 69 54 40 38 95 84 73 73 100 99 100 98
Arabic 98 97 84 89 96 94 65 72 99 99 88 94 100 100 99 100
Armenian 100 - 100 - 100 - 100 - 100 - 100 - 100 - 100 -
Azerbaijani 90 - 82 - 77 - 60 - 92 - 86 - 99 - 99 -
Basque 84 83 77 66 71 64 56 33 87 86 82 70 93 98 92 95
Belarusian 97 96 91 87 92 92 78 69 99 98 95 92 100 100 100 99
Bengali 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
Bokmal 58 - 66 - 39 - 42 - 59 - 69 - 75 - 87 -
Bosnian 35 - 26 - 29 - 12 - 35 - 22 - 40 - 44 -
Bulgarian 87 73 83 48 70 52 62 18 91 69 87 36 99 96 100 91
Catalan 70 58 42 31 50 32 11 2 74 57 32 16 86 84 81 77
Chinese 100 69 78 31 100 20 40 0 100 86 94 2 100 100 100 91
Croatian 72 74 50 41 53 54 23 8 74 72 44 24 90 97 81 91
Czech 80 72 67 49 65 54 42 21 84 75 70 46 91 88 90 81
Danish 81 83 60 55 61 63 34 19 84 86 52 51 98 99 94 96
Dutch 77 60 61 39 55 31 31 6 81 52 57 19 96 98 97 91
English 81 64 52 41 55 30 10 2 89 62 46 23 99 99 99 97
Esperanto 84 - 76 - 67 - 50 - 85 - 78 - 98 - 100 -
Estonian 92 84 59 61 80 66 29 23 96 88 60 63 100 100 88 98
Finnish 96 94 86 79 90 86 68 51 98 96 91 86 100 100 100 100
French 89 78 59 54 74 55 25 18 94 80 55 48 99 99 98 97
Ganda 91 - - - 79 - - - 95 - - - 100 - - -
Georgian 100 - 100 - 100 - 100 - 100 - 100 - 100 - 100 -
German 89 74 67 55 74 50 38 21 94 71 66 46 100 100 98 99
Greek 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
Gujarati 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
Hebrew 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
Hindi 73 80 58 51 61 65 28 16 64 75 49 38 93 99 99 98
Hungarian 95 88 78 77 86 75 53 51 98 91 82 82 100 100 100 99
Icelandic 93 90 76 78 83 76 53 53 97 94 76 82 100 100 99 99
Indonesian 60 60 29 18 39 37 10 0 61 62 25 1 81 82 52 54
Irish 91 90 78 80 82 80 56 58 94 92 82 85 96 99 97 98
Italian 87 80 64 51 69 58 31 12 92 84 61 43 100 99 100 98
Japanese 100 25 95 98 100 1 87 99 100 5 100 100 100 68 100 96
Kazakh 92 - 85 - 80 - 66 - 96 - 90 - 99 - 100 -
Korean 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
Latin 87 - 70 - 72 - 43 - 93 - 71 - 97 - 96 -
Latvian 93 90 86 78 85 78 72 56 97 93 88 82 98 98 98 97
Lithuanian 95 89 79 72 86 74 56 40 98 92 83 77 100 99 99 98
Macedonian 84 83 68 46 66 66 37 10 86 83 68 32 99 100 98 97
Malay 31 23 19 4 26 19 10 0 38 22 20 0 30 28 27 11
Maori 92 - 92 - 84 - 85 - 92 - 90 - 99 - 100 -
Marathi 85 90 81 71 74 81 62 43 85 92 83 74 96 98 98 96
Mongolian 97 - 84 - 93 - 66 - 99 - 88 - 99 - 99 -
Nynorsk 66 - 55 - 41 - 24 - 66 - 47 - 90 - 92 -
Persian 90 81 75 62 78 65 53 29 94 79 74 58 100 99 99 99
Polish 95 90 83 81 85 76 61 57 98 93 89 86 100 100 100 100
Portuguese 81 63 58 40 59 34 22 7 85 58 54 19 98 98 98 94
Punjabi 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
Romanian 87 78 67 55 69 57 34 24 92 80 68 50 99 97 99 91
Russian 90 80 50 53 76 62 20 22 95 85 43 50 98 94 86 87
Serbian 88 73 73 46 74 57 46 18 90 70 74 39 99 90 98 80
Shona 91 - - - 78 - - - 96 - - - 100 - - -
Slovak 84 76 70 47 64 53 39 12 90 76 73 38 99 98 99 92
Slovene 82 74 71 37 61 53 43 3 87 72 72 18 99 98 99 90
Somali 92 91 69 79 82 78 35 50 96 94 74 88 100 100 98 100
Sotho 85 - - - 67 - - - 90 - - - 99 - - -
Spanish 70 59 42 32 44 29 8 0 69 50 25 6 97 97 93 91
Swahili 81 75 73 60 60 50 45 26 84 75 74 58 98 99 99 98
Swedish 84 71 69 50 64 44 41 15 88 72 69 42 99 97 97 94
Tagalog 78 77 61 61 52 53 27 23 83 79 57 62 99 99 98 97
Tamil 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
Telugu 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
Thai 99 100 100 100 100 100 100 100 100 100 100 100 98 100 99 100
Tsonga 84 - - - 66 - - - 89 - - - 98 - - -
Tswana 84 - - - 65 - - - 88 - - - 99 - - -
Turkish 94 81 72 70 84 62 48 43 98 83 71 70 100 99 98 96
Ukrainian 93 81 79 68 84 62 54 39 97 84 83 69 96 97 99 94
Urdu 91 83 68 72 80 68 45 49 94 84 62 71 98 96 98 96
Vietnamese 91 85 84 87 79 63 66 65 94 92 86 95 99 100 100 100
Welsh 91 85 77 77 78 68 50 50 96 88 81 82 99 100 99 99
Xhosa 82 - - - 64 - - - 85 - - - 98 - - -
Yoruba 72 - - - 47 - - - 75 - - - 94 - - -
Zulu 81 - 78 - 62 - 51 - 83 - 82 - 97 - 100 -
Mean 86 80 74 65 74 64 53 41 89 81 74 61 96 96 95 93
Median 89.23 81.3 75.55 63.85 74.3 63.39 48.7 30.75 93.7 84.25 75.55 65.95 98.9 99.15 99.0 97.4
Standard Deviation 13.15 16.2 18.56 23.87 18.48 23.9 27.37 33.87 13.15 18.74 21.32 31.32 11.02 10.77 12.59 13.54