DerivedFunction commited on
Commit
d020bd2
·
verified ·
1 Parent(s): 8ce3cd1

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md CHANGED
@@ -127,6 +127,7 @@ The model supports the following ISO-coded languages:
127
 
128
  > Note that Romanized versions of any language is not included in the training set, such as Romanized Russian, and Hindi.
129
 
 
130
  ### The model scored the following on `papulca/language-identification`'s test set
131
  |Language | Correct | Total | Accuracy |
132
  |-------------|----------|-------------|--------|
@@ -152,6 +153,64 @@ The model supports the following ISO-coded languages:
152
 
153
  > As the training data is slightly biased toward English text, it may produce tokens for English rather than the target language in the Latin family.
154
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
155
  ### Training hyperparameters
156
 
157
  The following hyperparameters were used during training:
 
127
 
128
  > Note that Romanized versions of any language is not included in the training set, such as Romanized Russian, and Hindi.
129
 
130
+ ## Evaluation
131
  ### The model scored the following on `papulca/language-identification`'s test set
132
  |Language | Correct | Total | Accuracy |
133
  |-------------|----------|-------------|--------|
 
153
 
154
  > As the training data is slightly biased toward English text, it may produce tokens for English rather than the target language in the Latin family.
155
 
156
+ ### The model scored the following on `mikaberidze/lid200`'s test set, which is derived from `Davlan/sib200`
157
+
158
+ |Language | Correct | Total | Accuracy
159
+ ------------|----------|-----------|-----------
160
+ |af | 204 | 204 | 100.0%
161
+ |am | 204 | 204 | 100.0%
162
+ |as | 204 | 204 | 100.0%
163
+ |be | 204 | 204 | 100.0%
164
+ |bg | 204 | 204 | 100.0%
165
+ |bn | 204 | 204 | 100.0%
166
+ |cs | 204 | 204 | 100.0%
167
+ |da | 203 | 204 |99.5%
168
+ |de | 204 | 204 | 100.0%
169
+ |el | 204 | 204 | 100.0%
170
+ |en | 204 | 204 | 100.0%
171
+ |es | 204 | 204 | 100.0%
172
+ |fi | 204 | 204 | 100.0%
173
+ |fr | 204 | 204 | 100.0%
174
+ |gu | 204 | 204 | 100.0%
175
+ |he | 204 | 204 | 100.0%
176
+ |hi | 204 | 204 | 100.0%
177
+ |hu | 204 | 204 | 100.0%
178
+ |hy | 204 | 204 | 100.0%
179
+ |id | 198 | 204 |97.1%
180
+ |is | 204 | 204 | 100.0%
181
+ |it | 204 | 204 | 100.0%
182
+ |ja | 204 | 204 | 100.0%
183
+ |ka | 204 | 204 | 100.0%
184
+ |kk | 204 | 204 | 100.0%
185
+ |km | 204 | 204 | 100.0%
186
+ |kn | 204 | 204 | 100.0%
187
+ |ko | 204 | 204 | 100.0%
188
+ |lo | 204 | 204 | 100.0%
189
+ |mk | 203 | 204 | 99.5%
190
+ |ml | 204 | 204 | 100.0%
191
+ |mr | 204 | 204 | 100.0%
192
+ |my | 204 | 204 | 100.0%
193
+ |nl | 203 | 204 |99.5%
194
+ |pa | 204 | 204 | 100.0%
195
+ |pl | 204 | 204 | 100.0%
196
+ |pt | 204 | 204 | 100.0%
197
+ |ro | 204 | 204 | 100.0%
198
+ |ru | 204 | 204 | 100.0%
199
+ |sd | 204 | 204 | 100.0%
200
+ |sr | 204 | 204 | 100.0%
201
+ |sv | 204 | 204 | 100.0%
202
+ |ta | 204 | 204 | 100.0%
203
+ |te | 204 | 204 | 100.0%
204
+ |th | 204 | 204 | 100.0%
205
+ |tr | 204 | 204 | 100.0%
206
+ |ug | 204 | 204 | 100.0%
207
+ |uk | 204 | 204 | 100.0%
208
+ |ur | 204 | 204 | 100.0%
209
+ |vi | 204 | 204 | 100.0%
210
+ |zh |408 | 408 | 100.0%
211
+
212
+ > Caution: training data include text from Wikipedia and Finetranslations, which may skew the results.
213
+
214
  ### Training hyperparameters
215
 
216
  The following hyperparameters were used during training: