Commit
·
1fcda03
1
Parent(s):
97861fb
readme updated
Browse files
README.md
CHANGED
|
@@ -25,10 +25,26 @@ More precisely, it was pretrained with the Masked language modeling (MLM) object
|
|
| 25 |
|
| 26 |
This way, the model learns an inner representation of 100 languages that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard classifier using the features produced by the XLM-RoBERTa model as inputs.
|
| 27 |
|
| 28 |
-
## Training
|
|
|
|
|
|
|
|
|
|
| 29 |
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 34 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 25 |
|
| 26 |
This way, the model learns an inner representation of 100 languages that can then be used to extract features useful for downstream tasks: if you have a dataset of labeled sentences for instance, you can train a standard classifier using the features produced by the XLM-RoBERTa model as inputs.
|
| 27 |
|
| 28 |
+
## Training procedure
|
| 29 |
+
|
| 30 |
+
Fine-tuning was done via the `Trainer` API. Here is the [Colab notebook](https://colab.research.google.com/drive/15LJTckS6gU3RQOmjLqxVNBmbsBdnUEvl?usp=sharing) with the training code.
|
| 31 |
+
|
| 32 |
|
| 33 |
+
### Training hyperparameters
|
| 34 |
+
|
| 35 |
+
The following hyperparameters were used during training:
|
| 36 |
+
- learning_rate: 2e-5
|
| 37 |
+
- train_batch_size: 8
|
| 38 |
+
- eval_batch_size: 16
|
| 39 |
+
- optimizer: Adam
|
| 40 |
+
- evaluation strategy: epoch
|
| 41 |
+
- num_epochs: 3
|
| 42 |
+
- warmup_steps: 100
|
| 43 |
+
|
| 44 |
+
## Training result
|
| 45 |
|
| 46 |
+
| Training Loss | Epoch | Validation Loss | Accuracy | F1 |
|
| 47 |
+
|:-------------:|:-----:|:---------------:|:--------:|:------:|
|
| 48 |
+
| 0.003000 | 1 | 0.083116 | 0.9861 | 0.9863 |
|
| 49 |
+
| 0.000900 | 2 | 0.069443 | 0.9872 | 0.9874 |
|
| 50 |
+
| 0.087900 | 3 | 0.067496 | 0.9884 | 0.9885 |
|