xlm-roberta-large-bm
This model is a fine-tuned version of oza75/xlm-roberta-large-bm-cpt on an unknown dataset. It achieves the following results on the evaluation set:
- Loss: 3.6527
- Accuracy: 0.7539
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 1.75e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- distributed_type: multi-GPU
- num_devices: 3
- gradient_accumulation_steps: 4
- total_train_batch_size: 384
- total_eval_batch_size: 96
- optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: cosine
- lr_scheduler_warmup_steps: 0.06
- num_epochs: 50.0
Training results
| Training Loss | Epoch | Step | Validation Loss | Accuracy |
|---|---|---|---|---|
| 23.4593 | 2.1994 | 200 | 5.6475 | 0.6440 |
| 20.8680 | 4.3989 | 400 | 5.0077 | 0.6808 |
| 19.5976 | 6.5983 | 600 | 4.6963 | 0.6985 |
| 18.6174 | 8.7978 | 800 | 4.4655 | 0.7104 |
| 17.8897 | 10.9972 | 1000 | 4.3012 | 0.7211 |
| 17.1157 | 13.1884 | 1200 | 4.2094 | 0.7253 |
| 16.6388 | 15.3878 | 1400 | 4.0842 | 0.7310 |
| 16.5434 | 17.5873 | 1600 | 4.0012 | 0.7376 |
| 16.2096 | 19.7867 | 1800 | 3.9556 | 0.7376 |
| 15.9932 | 21.9861 | 2000 | 3.8802 | 0.7426 |
| 15.3912 | 24.1773 | 2200 | 3.8404 | 0.7442 |
| 15.2444 | 26.3767 | 2400 | 3.7937 | 0.7475 |
| 15.3315 | 28.5762 | 2600 | 3.7470 | 0.7488 |
| 15.2022 | 30.7756 | 2800 | 3.7129 | 0.7513 |
| 15.1072 | 32.9751 | 3000 | 3.7143 | 0.7516 |
| 14.8385 | 35.1662 | 3200 | 3.7064 | 0.7505 |
| 14.7511 | 37.3657 | 3400 | 3.6804 | 0.7535 |
| 14.9010 | 39.5651 | 3600 | 3.6705 | 0.7533 |
| 14.8393 | 41.7645 | 3800 | 3.6890 | 0.7521 |
| 14.8144 | 43.9640 | 4000 | 3.6512 | 0.7547 |
| 14.5626 | 46.1551 | 4200 | 3.6255 | 0.7557 |
| 14.5697 | 48.3546 | 4400 | 3.6409 | 0.7549 |
Framework versions
- Transformers 5.3.0.dev0
- Pytorch 2.4.1+cu124
- Datasets 4.6.1
- Tokenizers 0.22.2
- Downloads last month
- 195
Model tree for oza75/xlm-roberta-large-bm
Base model
oza75/xlm-roberta-large-bm-cpt