Update README.md
Browse files
README.md
CHANGED
|
@@ -66,7 +66,7 @@ It is split into two evaluation datasets EHRI-6 (714k tokens) and EHRI-9 (877k t
|
|
| 66 |
Improvements from the XLM-RoBERTa-large checkpoint.
|
| 67 |
The 490M test set is split from the dataset used to train this model and has a greater proportion of machine translations than the 42M test set.
|
| 68 |
|
| 69 |
-
Perplexity per language in the EHRI data:
|
| 70 |
| Model | cs (195k) | de (356k) | en (81k) | fr (3.5k) | hu (45k) | nl (2.5k) | pl (34k) | sk (6k) | yi (151k) |
|
| 71 |
| ----------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
|
| 72 |
| XLM-RoBERTa-large | 3.1553 | 3.4038 | 3.0588 | 2.0579 | 2.8928 | 2.9133 | 2.5284 | 2.6245 | **4.0217** |
|
|
|
|
| 66 |
Improvements from the XLM-RoBERTa-large checkpoint.
|
| 67 |
The 490M test set is split from the dataset used to train this model and has a greater proportion of machine translations than the 42M test set.
|
| 68 |
|
| 69 |
+
Perplexity per language in the EHRI data, number of tokens given in parentheses:
|
| 70 |
| Model | cs (195k) | de (356k) | en (81k) | fr (3.5k) | hu (45k) | nl (2.5k) | pl (34k) | sk (6k) | yi (151k) |
|
| 71 |
| ----------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
|
| 72 |
| XLM-RoBERTa-large | 3.1553 | 3.4038 | 3.0588 | 2.0579 | 2.8928 | 2.9133 | 2.5284 | 2.6245 | **4.0217** |
|