moe_tur_multi_batch_8
This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:
- Loss: 5.9074
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 2
- eval_batch_size: 32
- seed: 42
- gradient_accumulation_steps: 4
- total_train_batch_size: 8
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 38244
- training_steps: 382446
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| No log | 0 | 0 | 10.9789 |
| 7.4445 | 0.2615 | 10000 | 7.3980 |
| 5.6546 | 0.5229 | 20000 | 5.6101 |
| 5.036 | 0.7844 | 30000 | 5.0138 |
| 4.6238 | 1.0459 | 40000 | 4.6896 |
| 4.445 | 1.3074 | 50000 | 4.4645 |
| 4.2684 | 1.5689 | 60000 | 4.3137 |
| 4.2097 | 1.8303 | 70000 | 4.2050 |
| 3.9034 | 2.0918 | 80000 | 4.1332 |
| 3.8426 | 2.3533 | 90000 | 4.0845 |
| 3.8379 | 2.6148 | 100000 | 4.0366 |
| 3.8142 | 2.8763 | 110000 | 3.9902 |
| 3.4415 | 3.1377 | 120000 | 4.0038 |
| 3.5026 | 3.3992 | 130000 | 3.9916 |
| 3.5018 | 3.6607 | 140000 | 3.9655 |
| 3.5207 | 3.9222 | 150000 | 3.9371 |
| 3.5128 | 4.0000 | 152976 | 3.9287 |
| 2.9641 | 4.1837 | 160000 | 4.0548 |
| 3.038 | 4.4451 | 170000 | 4.0626 |
| 3.1051 | 4.7066 | 180000 | 4.0439 |
| 3.1606 | 4.9681 | 190000 | 4.0237 |
| 2.4678 | 5.2296 | 200000 | 4.2661 |
| 2.5802 | 5.4910 | 210000 | 4.2925 |
| 2.6698 | 5.7525 | 220000 | 4.2789 |
| 1.8528 | 6.0140 | 230000 | 4.3869 |
| 1.9577 | 6.2755 | 240000 | 4.6120 |
| 2.0102 | 6.5370 | 250000 | 4.6567 |
| 2.097 | 6.7984 | 260000 | 4.6590 |
| 2.1054 | 7.0000 | 267708 | 4.6630 |
| 1.3511 | 7.0599 | 270000 | 4.8671 |
| 1.3987 | 7.3214 | 280000 | 5.0544 |
| 1.5008 | 7.5829 | 290000 | 5.1165 |
| 1.5377 | 7.8444 | 300000 | 5.1459 |
| 0.9324 | 8.1058 | 310000 | 5.3730 |
| 0.9891 | 8.3673 | 320000 | 5.5069 |
| 1.0269 | 8.6288 | 330000 | 5.5789 |
| 1.0243 | 8.8903 | 340000 | 5.6237 |
| 0.6486 | 9.1518 | 350000 | 5.7925 |
| 0.6572 | 9.4132 | 360000 | 5.8604 |
| 0.6697 | 9.6747 | 370000 | 5.8925 |
| 0.6738 | 9.9362 | 380000 | 5.9074 |
Framework versions
- Transformers 4.51.0
- Pytorch 2.7.0+cu126
- Datasets 3.6.0
- Tokenizers 0.21.1
- Downloads last month
- -