DerivedFunction commited on
Commit
6b7ec2e
·
verified ·
1 Parent(s): 4a741b3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +43 -0
README.md CHANGED
@@ -127,6 +127,49 @@ The model supports the following ISO-coded languages:
127
 
128
  > Note that Romanized versions of any language is not included in the training set, such as Romanized Russian, and Hindi.
129
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
130
  ## Evaluation
131
  ### The model scored the following on `papulca/language-identification`'s test set
132
  |Language | Correct | Total | Accuracy |
 
127
 
128
  > Note that Romanized versions of any language is not included in the training set, such as Romanized Russian, and Hindi.
129
 
130
+ The coverage is as follows from a sample:
131
+
132
+ Per-group coverage (examples / tokens):
133
+ English 47 examples | 3947 tokens
134
+ Russian 47 examples | 3665 tokens
135
+ German 58 examples | 4625 tokens
136
+ Japanese 50 examples | 4188 tokens
137
+ Chinese 60 examples | 4131 tokens
138
+ French 40 examples | 3723 tokens
139
+ Spanish 44 examples | 4756 tokens
140
+ Portuguese 27 examples | 2130 tokens
141
+ Italian 57 examples | 5178 tokens
142
+ Polish 25 examples | 1753 tokens
143
+ Dutch 44 examples | 3082 tokens
144
+ Turkish 35 examples | 2315 tokens
145
+ SoutheastAsianLatin 114 examples | 8861 tokens
146
+ CentralEuropeanLatin 125 examples | 9761 tokens
147
+ Korean 38 examples | 3958 tokens
148
+ EastSlavicCyrillic 85 examples | 7471 tokens
149
+ Arabic 45 examples | 2508 tokens
150
+ NordicCore 194 examples | 14094 tokens
151
+ BalkanCyrillic 71 examples | 6231 tokens
152
+ ArabicOther 92 examples | 8010 tokens
153
+ Hindi 33 examples | 3251 tokens
154
+ IndicOther 261 examples | 40630 tokens
155
+ CentralAsianCyrillic 57 examples | 3789 tokens
156
+ AfricanLatin 82 examples | 5910 tokens
157
+ OtherScripts 269 examples | 28603 tokens
158
+
159
+ Top token languages:
160
+ ml 8197
161
+ it 5178
162
+ ta 4903
163
+ he 4873
164
+ es 4756
165
+ de 4625
166
+ kn 4613
167
+ pa 4457
168
+ ja 4188
169
+ zh 4131
170
+ uk 4007
171
+ ko 3958
172
+
173
  ## Evaluation
174
  ### The model scored the following on `papulca/language-identification`'s test set
175
  |Language | Correct | Total | Accuracy |