moe_g_smaller
This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:
- Loss: 4.1824
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 48952
- training_steps: 489524
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| No log | 0 | 0 | 10.9822 |
| 7.7303 | 0.2043 | 10000 | 7.6901 |
| 6.4004 | 0.4086 | 20000 | 6.3512 |
| 5.6444 | 0.6128 | 30000 | 5.6093 |
| 5.2576 | 0.8171 | 40000 | 5.2313 |
| 4.9578 | 1.0214 | 50000 | 5.0023 |
| 4.8172 | 1.2257 | 60000 | 4.8182 |
| 4.704 | 1.4299 | 70000 | 4.6885 |
| 4.6026 | 1.6342 | 80000 | 4.5951 |
| 4.539 | 1.8385 | 90000 | 4.5199 |
| 4.3395 | 2.0428 | 100000 | 4.4624 |
| 4.3334 | 2.2471 | 110000 | 4.4249 |
| 4.3147 | 2.4513 | 120000 | 4.3870 |
| 4.2981 | 2.6556 | 130000 | 4.3512 |
| 4.2773 | 2.8599 | 140000 | 4.3192 |
| 4.0874 | 3.0642 | 150000 | 4.3020 |
| 4.108 | 3.2684 | 160000 | 4.2842 |
| 4.1255 | 3.4727 | 170000 | 4.2625 |
| 4.1187 | 3.6770 | 180000 | 4.2417 |
| 4.1067 | 3.8813 | 190000 | 4.2223 |
| 3.9283 | 4.0856 | 200000 | 4.2262 |
| 3.9779 | 4.2898 | 210000 | 4.2159 |
| 3.9489 | 4.4941 | 220000 | 4.2002 |
| 3.9762 | 4.6984 | 230000 | 4.1840 |
| 3.9812 | 4.9027 | 240000 | 4.1695 |
| 3.9538 | 5.0 | 244765 | 4.1619 |
| 3.8109 | 5.1070 | 250000 | 4.1865 |
| 3.8151 | 5.3113 | 260000 | 4.1802 |
| 3.8114 | 5.5156 | 270000 | 4.1682 |
| 3.8476 | 5.7199 | 280000 | 4.1554 |
| 3.8423 | 5.9242 | 290000 | 4.1430 |
| 3.6525 | 6.1285 | 300000 | 4.1762 |
| 3.6778 | 6.3327 | 310000 | 4.1686 |
| 3.7166 | 6.5370 | 320000 | 4.1582 |
| 3.7423 | 6.7413 | 330000 | 4.1454 |
| 3.7259 | 6.9456 | 340000 | 4.1333 |
| 3.5657 | 7.1498 | 350000 | 4.1738 |
| 3.5815 | 7.3541 | 360000 | 4.1683 |
| 3.615 | 7.5584 | 370000 | 4.1591 |
| 3.6129 | 7.7627 | 380000 | 4.1502 |
| 3.6105 | 7.9670 | 390000 | 4.1394 |
| 3.4408 | 8.1712 | 400000 | 4.1830 |
| 3.4751 | 8.3755 | 410000 | 4.1787 |
| 3.465 | 8.5798 | 420000 | 4.1730 |
| 3.4814 | 8.7841 | 430000 | 4.1652 |
| 3.4932 | 8.9883 | 440000 | 4.1580 |
| 3.3643 | 9.1926 | 450000 | 4.1924 |
| 3.3672 | 9.3969 | 460000 | 4.1906 |
| 3.3694 | 9.6012 | 470000 | 4.1869 |
| 3.3765 | 9.8055 | 480000 | 4.1840 |
Framework versions
- Transformers 4.51.0
- Pytorch 2.7.0+cu126
- Datasets 3.6.0
- Tokenizers 0.21.1
- Downloads last month
- -