moe_g
This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:
- Loss: 4.7150
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 48952
- training_steps: 489524
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| No log | 0 | 0 | 10.9795 |
| 7.5759 | 0.2043 | 10000 | 7.5365 |
| 6.0757 | 0.4086 | 20000 | 6.0247 |
| 5.3575 | 0.6128 | 30000 | 5.3397 |
| 5.0644 | 0.8171 | 40000 | 5.0423 |
| 4.7435 | 1.0214 | 50000 | 4.8210 |
| 4.6448 | 1.2257 | 60000 | 4.6657 |
| 4.551 | 1.4299 | 70000 | 4.5522 |
| 4.4603 | 1.6342 | 80000 | 4.4658 |
| 4.4004 | 1.8385 | 90000 | 4.3951 |
| 4.135 | 2.0428 | 100000 | 4.3416 |
| 4.1413 | 2.2471 | 110000 | 4.3071 |
| 4.1372 | 2.4513 | 120000 | 4.2719 |
| 4.1268 | 2.6556 | 130000 | 4.2369 |
| 4.11 | 2.8599 | 140000 | 4.2044 |
| 3.8058 | 3.0642 | 150000 | 4.2044 |
| 3.8508 | 3.2684 | 160000 | 4.1921 |
| 3.8827 | 3.4727 | 170000 | 4.1721 |
| 3.8842 | 3.6770 | 180000 | 4.1516 |
| 3.8795 | 3.8813 | 190000 | 4.1302 |
| 3.5506 | 4.0856 | 200000 | 4.1772 |
| 3.6215 | 4.2898 | 210000 | 4.1755 |
| 3.6136 | 4.4941 | 220000 | 4.1624 |
| 3.6512 | 4.6984 | 230000 | 4.1430 |
| 3.6687 | 4.9027 | 240000 | 4.1261 |
| 3.2695 | 5.1069 | 250000 | 4.2228 |
| 3.3309 | 5.3112 | 260000 | 4.2313 |
| 3.3674 | 5.5155 | 270000 | 4.2179 |
| 3.4091 | 5.7198 | 280000 | 4.2003 |
| 3.4325 | 5.9241 | 290000 | 4.1842 |
| 2.9939 | 6.1283 | 300000 | 4.3241 |
| 3.0435 | 6.3326 | 310000 | 4.3383 |
| 3.1034 | 6.5369 | 320000 | 4.3293 |
| 3.1355 | 6.7412 | 330000 | 4.3161 |
| 3.134 | 6.9454 | 340000 | 4.3015 |
| 2.7404 | 7.1497 | 350000 | 4.4597 |
| 2.7835 | 7.3540 | 360000 | 4.4807 |
| 2.8187 | 7.5583 | 370000 | 4.4815 |
| 2.8291 | 7.7626 | 380000 | 4.4772 |
| 2.8283 | 7.9668 | 390000 | 4.4730 |
| 2.4822 | 8.1711 | 400000 | 4.6015 |
| 2.512 | 8.3754 | 410000 | 4.6191 |
| 2.5151 | 8.5797 | 420000 | 4.6252 |
| 2.5148 | 8.7839 | 430000 | 4.6268 |
| 2.5285 | 8.9882 | 440000 | 4.6275 |
| 2.4292 | 9.1927 | 450000 | 4.7047 |
| 2.4565 | 9.3970 | 460000 | 4.7123 |
| 2.4553 | 9.6012 | 470000 | 4.7143 |
| 2.4479 | 9.8055 | 480000 | 4.7146 |
Framework versions
- Transformers 4.51.0
- Pytorch 2.7.0+cu126
- Datasets 3.6.0
- Tokenizers 0.21.1
- Downloads last month
- -