Update README.md
Browse files
README.md
CHANGED
|
@@ -127,6 +127,7 @@ The model supports the following ISO-coded languages:
|
|
| 127 |
|
| 128 |
> Note that Romanized versions of any language is not included in the training set, such as Romanized Russian, and Hindi.
|
| 129 |
|
|
|
|
| 130 |
### The model scored the following on `papulca/language-identification`'s test set
|
| 131 |
|Language | Correct | Total | Accuracy |
|
| 132 |
|-------------|----------|-------------|--------|
|
|
@@ -152,6 +153,64 @@ The model supports the following ISO-coded languages:
|
|
| 152 |
|
| 153 |
> As the training data is slightly biased toward English text, it may produce tokens for English rather than the target language in the Latin family.
|
| 154 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 155 |
### Training hyperparameters
|
| 156 |
|
| 157 |
The following hyperparameters were used during training:
|
|
|
|
| 127 |
|
| 128 |
> Note that Romanized versions of any language is not included in the training set, such as Romanized Russian, and Hindi.
|
| 129 |
|
| 130 |
+
## Evaluation
|
| 131 |
### The model scored the following on `papulca/language-identification`'s test set
|
| 132 |
|Language | Correct | Total | Accuracy |
|
| 133 |
|-------------|----------|-------------|--------|
|
|
|
|
| 153 |
|
| 154 |
> As the training data is slightly biased toward English text, it may produce tokens for English rather than the target language in the Latin family.
|
| 155 |
|
| 156 |
+
### The model scored the following on `mikaberidze/lid200`'s test set, which is derived from `Davlan/sib200`
|
| 157 |
+
|
| 158 |
+
|Language | Correct | Total | Accuracy
|
| 159 |
+
------------|----------|-----------|-----------
|
| 160 |
+
|af | 204 | 204 | 100.0%
|
| 161 |
+
|am | 204 | 204 | 100.0%
|
| 162 |
+
|as | 204 | 204 | 100.0%
|
| 163 |
+
|be | 204 | 204 | 100.0%
|
| 164 |
+
|bg | 204 | 204 | 100.0%
|
| 165 |
+
|bn | 204 | 204 | 100.0%
|
| 166 |
+
|cs | 204 | 204 | 100.0%
|
| 167 |
+
|da | 203 | 204 |99.5%
|
| 168 |
+
|de | 204 | 204 | 100.0%
|
| 169 |
+
|el | 204 | 204 | 100.0%
|
| 170 |
+
|en | 204 | 204 | 100.0%
|
| 171 |
+
|es | 204 | 204 | 100.0%
|
| 172 |
+
|fi | 204 | 204 | 100.0%
|
| 173 |
+
|fr | 204 | 204 | 100.0%
|
| 174 |
+
|gu | 204 | 204 | 100.0%
|
| 175 |
+
|he | 204 | 204 | 100.0%
|
| 176 |
+
|hi | 204 | 204 | 100.0%
|
| 177 |
+
|hu | 204 | 204 | 100.0%
|
| 178 |
+
|hy | 204 | 204 | 100.0%
|
| 179 |
+
|id | 198 | 204 |97.1%
|
| 180 |
+
|is | 204 | 204 | 100.0%
|
| 181 |
+
|it | 204 | 204 | 100.0%
|
| 182 |
+
|ja | 204 | 204 | 100.0%
|
| 183 |
+
|ka | 204 | 204 | 100.0%
|
| 184 |
+
|kk | 204 | 204 | 100.0%
|
| 185 |
+
|km | 204 | 204 | 100.0%
|
| 186 |
+
|kn | 204 | 204 | 100.0%
|
| 187 |
+
|ko | 204 | 204 | 100.0%
|
| 188 |
+
|lo | 204 | 204 | 100.0%
|
| 189 |
+
|mk | 203 | 204 | 99.5%
|
| 190 |
+
|ml | 204 | 204 | 100.0%
|
| 191 |
+
|mr | 204 | 204 | 100.0%
|
| 192 |
+
|my | 204 | 204 | 100.0%
|
| 193 |
+
|nl | 203 | 204 |99.5%
|
| 194 |
+
|pa | 204 | 204 | 100.0%
|
| 195 |
+
|pl | 204 | 204 | 100.0%
|
| 196 |
+
|pt | 204 | 204 | 100.0%
|
| 197 |
+
|ro | 204 | 204 | 100.0%
|
| 198 |
+
|ru | 204 | 204 | 100.0%
|
| 199 |
+
|sd | 204 | 204 | 100.0%
|
| 200 |
+
|sr | 204 | 204 | 100.0%
|
| 201 |
+
|sv | 204 | 204 | 100.0%
|
| 202 |
+
|ta | 204 | 204 | 100.0%
|
| 203 |
+
|te | 204 | 204 | 100.0%
|
| 204 |
+
|th | 204 | 204 | 100.0%
|
| 205 |
+
|tr | 204 | 204 | 100.0%
|
| 206 |
+
|ug | 204 | 204 | 100.0%
|
| 207 |
+
|uk | 204 | 204 | 100.0%
|
| 208 |
+
|ur | 204 | 204 | 100.0%
|
| 209 |
+
|vi | 204 | 204 | 100.0%
|
| 210 |
+
|zh |408 | 408 | 100.0%
|
| 211 |
+
|
| 212 |
+
> Caution: training data include text from Wikipedia and Finetranslations, which may skew the results.
|
| 213 |
+
|
| 214 |
### Training hyperparameters
|
| 215 |
|
| 216 |
The following hyperparameters were used during training:
|