moe_g_fewer
This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:
- Loss: 4.4850
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 32
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 48952
- training_steps: 489524
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| No log | 0 | 0 | 10.9593 |
| 7.6722 | 0.2043 | 10000 | 7.6345 |
| 6.2481 | 0.4086 | 20000 | 6.1965 |
| 5.4523 | 0.6128 | 30000 | 5.4268 |
| 5.1371 | 0.8171 | 40000 | 5.1115 |
| 4.8513 | 1.0214 | 50000 | 4.9161 |
| 4.7254 | 1.2257 | 60000 | 4.7347 |
| 4.6202 | 1.4299 | 70000 | 4.6103 |
| 4.5199 | 1.6342 | 80000 | 4.5181 |
| 4.4556 | 1.8385 | 90000 | 4.4434 |
| 4.215 | 2.0428 | 100000 | 4.3866 |
| 4.2193 | 2.2471 | 110000 | 4.3500 |
| 4.2089 | 2.4513 | 120000 | 4.3132 |
| 4.1947 | 2.6556 | 130000 | 4.2772 |
| 4.1784 | 2.8599 | 140000 | 4.2443 |
| 3.9164 | 3.0642 | 150000 | 4.2354 |
| 3.9543 | 3.2684 | 160000 | 4.2205 |
| 3.9829 | 3.4727 | 170000 | 4.1993 |
| 3.9822 | 3.6770 | 180000 | 4.1776 |
| 3.9745 | 3.8813 | 190000 | 4.1578 |
| 3.7046 | 4.0856 | 200000 | 4.1851 |
| 3.7697 | 4.2898 | 210000 | 4.1768 |
| 3.7587 | 4.4941 | 220000 | 4.1615 |
| 3.7922 | 4.6984 | 230000 | 4.1442 |
| 3.8065 | 4.9027 | 240000 | 4.1281 |
| 3.7782 | 5.0 | 244765 | 4.1190 |
| 3.5211 | 5.1070 | 250000 | 4.1852 |
| 3.5476 | 5.3113 | 260000 | 4.1843 |
| 3.5629 | 5.5156 | 270000 | 4.1702 |
| 3.6071 | 5.7199 | 280000 | 4.1553 |
| 3.6042 | 5.9242 | 290000 | 4.1417 |
| 3.2831 | 6.1285 | 300000 | 4.2341 |
| 3.3392 | 6.3327 | 310000 | 4.2326 |
| 3.3938 | 6.5370 | 320000 | 4.2171 |
| 3.4265 | 6.7413 | 330000 | 4.2025 |
| 3.4233 | 6.9456 | 340000 | 4.1863 |
| 3.1121 | 7.1498 | 350000 | 4.3056 |
| 3.153 | 7.3541 | 360000 | 4.3045 |
| 3.2056 | 7.5584 | 370000 | 4.2957 |
| 3.2119 | 7.7627 | 380000 | 4.2851 |
| 3.2139 | 7.9670 | 390000 | 4.2719 |
| 2.9033 | 8.1712 | 400000 | 4.3931 |
| 2.9548 | 8.3755 | 410000 | 4.3987 |
| 2.9612 | 8.5798 | 420000 | 4.3953 |
| 2.9801 | 8.7841 | 430000 | 4.3889 |
| 2.9958 | 8.9883 | 440000 | 4.3819 |
| 2.7425 | 9.1926 | 450000 | 4.4773 |
| 2.7596 | 9.3969 | 460000 | 4.4854 |
| 2.7614 | 9.6012 | 470000 | 4.4864 |
| 2.773 | 9.8055 | 480000 | 4.4853 |
Framework versions
- Transformers 4.51.0
- Pytorch 2.7.0+cu126
- Datasets 3.6.0
- Tokenizers 0.21.1
- Downloads last month
- -