mixtral_5_6gpu
This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:
- Loss: 4.3696
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 4
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 6
- gradient_accumulation_steps: 16
- total_train_batch_size: 384
- total_eval_batch_size: 48
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 40746
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| No log | 0 | 0 | 10.9761 |
| 7.1488 | 0.2454 | 1000 | 6.9551 |
| 5.9011 | 0.4908 | 2000 | 5.8183 |
| 5.4187 | 0.7363 | 3000 | 5.3778 |
| 5.1765 | 0.9817 | 4000 | 5.1484 |
| 4.983 | 1.2270 | 5000 | 5.0035 |
| 4.876 | 1.4724 | 6000 | 4.8925 |
| 4.7906 | 1.7179 | 7000 | 4.7991 |
| 4.7131 | 1.9633 | 8000 | 4.7258 |
| 4.5733 | 2.2086 | 9000 | 4.6749 |
| 4.5394 | 2.4540 | 10000 | 4.6248 |
| 4.5068 | 2.6995 | 11000 | 4.5808 |
| 4.469 | 2.9449 | 12000 | 4.5393 |
| 4.3381 | 3.1902 | 13000 | 4.5207 |
| 4.3277 | 3.4356 | 14000 | 4.4930 |
| 4.3198 | 3.6810 | 15000 | 4.4654 |
| 4.2995 | 3.9265 | 16000 | 4.4391 |
| 4.1697 | 4.1718 | 17000 | 4.4364 |
| 4.1779 | 4.4172 | 18000 | 4.4203 |
| 4.1732 | 4.6626 | 19000 | 4.4012 |
| 4.1631 | 4.9081 | 20000 | 4.3828 |
| 4.0294 | 5.1534 | 21000 | 4.3887 |
| 4.0533 | 5.3988 | 22000 | 4.3801 |
| 4.0511 | 5.6442 | 23000 | 4.3681 |
| 4.0532 | 5.8897 | 24000 | 4.3559 |
| 3.9201 | 6.1350 | 25000 | 4.3686 |
| 3.9407 | 6.3804 | 26000 | 4.3653 |
| 3.9511 | 6.6258 | 27000 | 4.3558 |
| 3.9468 | 6.8712 | 28000 | 4.3467 |
| 3.8237 | 7.1166 | 29000 | 4.3628 |
| 3.8449 | 7.3620 | 30000 | 4.3622 |
| 3.8537 | 7.6074 | 31000 | 4.3554 |
| 3.8602 | 7.8528 | 32000 | 4.3491 |
| 3.7498 | 8.0982 | 33000 | 4.3658 |
| 3.7648 | 8.3436 | 34000 | 4.3675 |
| 3.7633 | 8.5890 | 35000 | 4.3641 |
| 3.7766 | 8.8344 | 36000 | 4.3592 |
| 3.6848 | 9.0798 | 37000 | 4.3705 |
| 3.6937 | 9.3252 | 38000 | 4.3738 |
| 3.6984 | 9.5706 | 39000 | 4.3721 |
| 3.7008 | 9.8160 | 40000 | 4.3701 |
Framework versions
- Transformers 4.53.1
- Pytorch 2.7.0+cu126
- Datasets 3.6.0
- Tokenizers 0.21.1
- Downloads last month
- -