mixtral_5_6gpu_new_settings_h100
This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:
- Loss: 4.7906
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- distributed_type: multi-GPU
- num_devices: 2
- gradient_accumulation_steps: 16
- total_train_batch_size: 256
- total_eval_batch_size: 16
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_steps: 500
- training_steps: 40746
- mixed_precision_training: Native AMP
Training results
| Training Loss | Epoch | Step | Validation Loss |
|---|---|---|---|
| No log | 0 | 0 | 10.9681 |
| 7.3907 | 0.1636 | 1000 | 7.1875 |
| 5.9337 | 0.3272 | 2000 | 5.8414 |
| 5.3958 | 0.4908 | 3000 | 5.3494 |
| 5.1307 | 0.6545 | 4000 | 5.1029 |
| 4.9703 | 0.8181 | 5000 | 4.9405 |
| 4.8382 | 0.9817 | 6000 | 4.8176 |
| 4.6294 | 1.1453 | 7000 | 4.7346 |
| 4.58 | 1.3089 | 8000 | 4.6590 |
| 4.5336 | 1.4725 | 9000 | 4.5953 |
| 4.4704 | 1.6361 | 10000 | 4.5386 |
| 4.4342 | 1.7997 | 11000 | 4.4860 |
| 4.3908 | 1.9634 | 12000 | 4.4411 |
| 4.1105 | 2.1270 | 13000 | 4.4373 |
| 4.1026 | 2.2906 | 14000 | 4.4111 |
| 4.0997 | 2.4542 | 15000 | 4.3853 |
| 4.0912 | 2.6178 | 16000 | 4.3593 |
| 4.0791 | 2.7814 | 17000 | 4.3366 |
| 4.0683 | 2.9450 | 18000 | 4.3135 |
| 3.6831 | 3.1086 | 19000 | 4.3660 |
| 3.7023 | 3.2723 | 20000 | 4.3685 |
| 3.7204 | 3.4359 | 21000 | 4.3607 |
| 3.7164 | 3.5995 | 22000 | 4.3513 |
| 3.723 | 3.7631 | 23000 | 4.3397 |
| 3.7212 | 3.9267 | 24000 | 4.3282 |
| 3.2371 | 4.0903 | 25000 | 4.4193 |
| 3.2627 | 4.2539 | 26000 | 4.4511 |
| 3.2844 | 4.4175 | 27000 | 4.4612 |
| 3.2914 | 4.5812 | 28000 | 4.4673 |
| 3.3027 | 4.7448 | 29000 | 4.4678 |
| 3.3098 | 4.9084 | 30000 | 4.4663 |
| 2.7994 | 5.0720 | 31000 | 4.5715 |
| 2.8213 | 5.2356 | 32000 | 4.6152 |
| 2.8343 | 5.3992 | 33000 | 4.6370 |
| 2.8467 | 5.5628 | 34000 | 4.6507 |
| 2.8512 | 5.7264 | 35000 | 4.6595 |
| 2.8536 | 5.8901 | 36000 | 4.6647 |
| 2.4456 | 6.0537 | 37000 | 4.7375 |
| 2.4629 | 6.2173 | 38000 | 4.7686 |
| 2.4635 | 6.3809 | 39000 | 4.7827 |
| 2.4631 | 6.5445 | 40000 | 4.7894 |
Framework versions
- Transformers 4.53.1
- Pytorch 2.7.0+cu126
- Datasets 3.6.0
- Tokenizers 0.21.1
- Downloads last month
- -