You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

mixtral_5_6gpu

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

  • Loss: 4.3696

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 6
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 384
  • total_eval_batch_size: 48
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 40746
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 0 0 10.9761
7.1488 0.2454 1000 6.9551
5.9011 0.4908 2000 5.8183
5.4187 0.7363 3000 5.3778
5.1765 0.9817 4000 5.1484
4.983 1.2270 5000 5.0035
4.876 1.4724 6000 4.8925
4.7906 1.7179 7000 4.7991
4.7131 1.9633 8000 4.7258
4.5733 2.2086 9000 4.6749
4.5394 2.4540 10000 4.6248
4.5068 2.6995 11000 4.5808
4.469 2.9449 12000 4.5393
4.3381 3.1902 13000 4.5207
4.3277 3.4356 14000 4.4930
4.3198 3.6810 15000 4.4654
4.2995 3.9265 16000 4.4391
4.1697 4.1718 17000 4.4364
4.1779 4.4172 18000 4.4203
4.1732 4.6626 19000 4.4012
4.1631 4.9081 20000 4.3828
4.0294 5.1534 21000 4.3887
4.0533 5.3988 22000 4.3801
4.0511 5.6442 23000 4.3681
4.0532 5.8897 24000 4.3559
3.9201 6.1350 25000 4.3686
3.9407 6.3804 26000 4.3653
3.9511 6.6258 27000 4.3558
3.9468 6.8712 28000 4.3467
3.8237 7.1166 29000 4.3628
3.8449 7.3620 30000 4.3622
3.8537 7.6074 31000 4.3554
3.8602 7.8528 32000 4.3491
3.7498 8.0982 33000 4.3658
3.7648 8.3436 34000 4.3675
3.7633 8.5890 35000 4.3641
3.7766 8.8344 36000 4.3592
3.6848 9.0798 37000 4.3705
3.6937 9.3252 38000 4.3738
3.6984 9.5706 39000 4.3721
3.7008 9.8160 40000 4.3701

Framework versions

  • Transformers 4.53.1
  • Pytorch 2.7.0+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
-
Safetensors
Model size
0.2B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Evaluation results