moe_e_hom
This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:
- Loss: 5.2265
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 1024
- total_eval_batch_size: 64
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 20871
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| No log | 0 | 0 | 11.0172 |
| 7.6961 | 0.4791 | 1000 | 6.8598 |
| 6.4686 | 0.9582 | 2000 | 5.7974 |
| 6.0292 | 1.4370 | 3000 | 5.4299 |
| 5.8161 | 1.9161 | 4000 | 5.2142 |
| 5.5642 | 2.3948 | 5000 | 5.0939 |
| 5.4886 | 2.8739 | 6000 | 4.9927 |
| 5.2586 | 3.3526 | 7000 | 4.9522 |
| 5.2387 | 3.8317 | 8000 | 4.8943 |
| 5.0007 | 4.3105 | 9000 | 4.9058 |
| 5.0116 | 4.7896 | 10000 | 4.8711 |
| 4.7478 | 5.2683 | 11000 | 4.9165 |
| 4.7846 | 5.7474 | 12000 | 4.9028 |
| 4.509 | 6.2261 | 13000 | 4.9728 |
| 4.5599 | 6.7053 | 14000 | 4.9730 |
| 4.2942 | 7.1840 | 15000 | 5.0498 |
| 4.347 | 7.6631 | 16000 | 5.0659 |
| 4.3628 | 8.0 | 16704 | 5.0655 |
| 4.1082 | 8.1418 | 17000 | 5.1341 |
| 4.1578 | 8.6209 | 18000 | 5.1577 |
| 3.9673 | 9.0997 | 19000 | 5.2038 |
| 3.9913 | 9.5788 | 20000 | 5.2233 |
Framework versions
- Transformers 4.53.1
- Pytorch 2.7.0+cu126
- Datasets 3.6.0
- Tokenizers 0.21.1
- Downloads last month
- -