Fill-Mask
Transformers
Safetensors
xlm-roberta
holocaust
speech
historical
ChrisBridges commited on
Commit
bbab3f6
·
verified ·
1 Parent(s): 7def7c5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -66,7 +66,7 @@ It is split into two evaluation datasets EHRI-6 (714k tokens) and EHRI-9 (877k t
66
  Improvements from the XLM-RoBERTa-large checkpoint.
67
  The 490M test set is split from the dataset used to train this model and has a greater proportion of machine translations than the 42M test set.
68
 
69
- Perplexity per language in the EHRI data:
70
  | Model | cs (195k) | de (356k) | en (81k) | fr (3.5k) | hu (45k) | nl (2.5k) | pl (34k) | sk (6k) | yi (151k) |
71
  | ----------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
72
  | XLM-RoBERTa-large | 3.1553 | 3.4038 | 3.0588 | 2.0579 | 2.8928 | 2.9133 | 2.5284 | 2.6245 | **4.0217** |
 
66
  Improvements from the XLM-RoBERTa-large checkpoint.
67
  The 490M test set is split from the dataset used to train this model and has a greater proportion of machine translations than the 42M test set.
68
 
69
+ Perplexity per language in the EHRI data, number of tokens given in parentheses:
70
  | Model | cs (195k) | de (356k) | en (81k) | fr (3.5k) | hu (45k) | nl (2.5k) | pl (34k) | sk (6k) | yi (151k) |
71
  | ----------------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- | ---------- |
72
  | XLM-RoBERTa-large | 3.1553 | 3.4038 | 3.0588 | 2.0579 | 2.8928 | 2.9133 | 2.5284 | 2.6245 | **4.0217** |