You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

mixtral_5_6gpu_new_settings_h100

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

  • Loss: 4.7906

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0001
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 256
  • total_eval_batch_size: 16
  • optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 500
  • training_steps: 40746
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
No log 0 0 10.9681
7.3907 0.1636 1000 7.1875
5.9337 0.3272 2000 5.8414
5.3958 0.4908 3000 5.3494
5.1307 0.6545 4000 5.1029
4.9703 0.8181 5000 4.9405
4.8382 0.9817 6000 4.8176
4.6294 1.1453 7000 4.7346
4.58 1.3089 8000 4.6590
4.5336 1.4725 9000 4.5953
4.4704 1.6361 10000 4.5386
4.4342 1.7997 11000 4.4860
4.3908 1.9634 12000 4.4411
4.1105 2.1270 13000 4.4373
4.1026 2.2906 14000 4.4111
4.0997 2.4542 15000 4.3853
4.0912 2.6178 16000 4.3593
4.0791 2.7814 17000 4.3366
4.0683 2.9450 18000 4.3135
3.6831 3.1086 19000 4.3660
3.7023 3.2723 20000 4.3685
3.7204 3.4359 21000 4.3607
3.7164 3.5995 22000 4.3513
3.723 3.7631 23000 4.3397
3.7212 3.9267 24000 4.3282
3.2371 4.0903 25000 4.4193
3.2627 4.2539 26000 4.4511
3.2844 4.4175 27000 4.4612
3.2914 4.5812 28000 4.4673
3.3027 4.7448 29000 4.4678
3.3098 4.9084 30000 4.4663
2.7994 5.0720 31000 4.5715
2.8213 5.2356 32000 4.6152
2.8343 5.3992 33000 4.6370
2.8467 5.5628 34000 4.6507
2.8512 5.7264 35000 4.6595
2.8536 5.8901 36000 4.6647
2.4456 6.0537 37000 4.7375
2.4629 6.2173 38000 4.7686
2.4635 6.3809 39000 4.7827
2.4631 6.5445 40000 4.7894

Framework versions

  • Transformers 4.53.1
  • Pytorch 2.7.0+cu126
  • Datasets 3.6.0
  • Tokenizers 0.21.1
Downloads last month
-
Safetensors
Model size
0.9B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results