|
|
--- |
|
|
library_name: transformers |
|
|
tags: |
|
|
- generated_from_trainer |
|
|
datasets: |
|
|
- arrow |
|
|
model-index: |
|
|
- name: mixtral_5_6gpu |
|
|
results: [] |
|
|
--- |
|
|
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
|
|
# mixtral_5_6gpu |
|
|
|
|
|
This model is a fine-tuned version of [](https://huggingface.co/) on the arrow dataset. |
|
|
It achieves the following results on the evaluation set: |
|
|
- Loss: 4.3696 |
|
|
|
|
|
## Model description |
|
|
|
|
|
More information needed |
|
|
|
|
|
## Intended uses & limitations |
|
|
|
|
|
More information needed |
|
|
|
|
|
## Training and evaluation data |
|
|
|
|
|
More information needed |
|
|
|
|
|
## Training procedure |
|
|
|
|
|
### Training hyperparameters |
|
|
|
|
|
The following hyperparameters were used during training: |
|
|
- learning_rate: 0.0001 |
|
|
- train_batch_size: 4 |
|
|
- eval_batch_size: 8 |
|
|
- seed: 42 |
|
|
- distributed_type: multi-GPU |
|
|
- num_devices: 6 |
|
|
- gradient_accumulation_steps: 16 |
|
|
- total_train_batch_size: 384 |
|
|
- total_eval_batch_size: 48 |
|
|
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments |
|
|
- lr_scheduler_type: linear |
|
|
- lr_scheduler_warmup_steps: 500 |
|
|
- training_steps: 40746 |
|
|
- mixed_precision_training: Native AMP |
|
|
|
|
|
### Training results |
|
|
|
|
|
| Training Loss | Epoch | Step | Validation Loss | |
|
|
|:-------------:|:------:|:-----:|:---------------:| |
|
|
| No log | 0 | 0 | 10.9761 | |
|
|
| 7.1488 | 0.2454 | 1000 | 6.9551 | |
|
|
| 5.9011 | 0.4908 | 2000 | 5.8183 | |
|
|
| 5.4187 | 0.7363 | 3000 | 5.3778 | |
|
|
| 5.1765 | 0.9817 | 4000 | 5.1484 | |
|
|
| 4.983 | 1.2270 | 5000 | 5.0035 | |
|
|
| 4.876 | 1.4724 | 6000 | 4.8925 | |
|
|
| 4.7906 | 1.7179 | 7000 | 4.7991 | |
|
|
| 4.7131 | 1.9633 | 8000 | 4.7258 | |
|
|
| 4.5733 | 2.2086 | 9000 | 4.6749 | |
|
|
| 4.5394 | 2.4540 | 10000 | 4.6248 | |
|
|
| 4.5068 | 2.6995 | 11000 | 4.5808 | |
|
|
| 4.469 | 2.9449 | 12000 | 4.5393 | |
|
|
| 4.3381 | 3.1902 | 13000 | 4.5207 | |
|
|
| 4.3277 | 3.4356 | 14000 | 4.4930 | |
|
|
| 4.3198 | 3.6810 | 15000 | 4.4654 | |
|
|
| 4.2995 | 3.9265 | 16000 | 4.4391 | |
|
|
| 4.1697 | 4.1718 | 17000 | 4.4364 | |
|
|
| 4.1779 | 4.4172 | 18000 | 4.4203 | |
|
|
| 4.1732 | 4.6626 | 19000 | 4.4012 | |
|
|
| 4.1631 | 4.9081 | 20000 | 4.3828 | |
|
|
| 4.0294 | 5.1534 | 21000 | 4.3887 | |
|
|
| 4.0533 | 5.3988 | 22000 | 4.3801 | |
|
|
| 4.0511 | 5.6442 | 23000 | 4.3681 | |
|
|
| 4.0532 | 5.8897 | 24000 | 4.3559 | |
|
|
| 3.9201 | 6.1350 | 25000 | 4.3686 | |
|
|
| 3.9407 | 6.3804 | 26000 | 4.3653 | |
|
|
| 3.9511 | 6.6258 | 27000 | 4.3558 | |
|
|
| 3.9468 | 6.8712 | 28000 | 4.3467 | |
|
|
| 3.8237 | 7.1166 | 29000 | 4.3628 | |
|
|
| 3.8449 | 7.3620 | 30000 | 4.3622 | |
|
|
| 3.8537 | 7.6074 | 31000 | 4.3554 | |
|
|
| 3.8602 | 7.8528 | 32000 | 4.3491 | |
|
|
| 3.7498 | 8.0982 | 33000 | 4.3658 | |
|
|
| 3.7648 | 8.3436 | 34000 | 4.3675 | |
|
|
| 3.7633 | 8.5890 | 35000 | 4.3641 | |
|
|
| 3.7766 | 8.8344 | 36000 | 4.3592 | |
|
|
| 3.6848 | 9.0798 | 37000 | 4.3705 | |
|
|
| 3.6937 | 9.3252 | 38000 | 4.3738 | |
|
|
| 3.6984 | 9.5706 | 39000 | 4.3721 | |
|
|
| 3.7008 | 9.8160 | 40000 | 4.3701 | |
|
|
|
|
|
|
|
|
### Framework versions |
|
|
|
|
|
- Transformers 4.53.1 |
|
|
- Pytorch 2.7.0+cu126 |
|
|
- Datasets 3.6.0 |
|
|
- Tokenizers 0.21.1 |
|
|
|