You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

mixtral_5_6gpu

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 4
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 6
gradient_accumulation_steps: 16
total_train_batch_size: 384
total_eval_batch_size: 48
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 40746
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
No log	0	0	10.9761
7.1488	0.2454	1000	6.9551
5.9011	0.4908	2000	5.8183
5.4187	0.7363	3000	5.3778
5.1765	0.9817	4000	5.1484
4.983	1.2270	5000	5.0035
4.876	1.4724	6000	4.8925
4.7906	1.7179	7000	4.7991
4.7131	1.9633	8000	4.7258
4.5733	2.2086	9000	4.6749
4.5394	2.4540	10000	4.6248
4.5068	2.6995	11000	4.5808
4.469	2.9449	12000	4.5393
4.3381	3.1902	13000	4.5207
4.3277	3.4356	14000	4.4930
4.3198	3.6810	15000	4.4654
4.2995	3.9265	16000	4.4391
4.1697	4.1718	17000	4.4364
4.1779	4.4172	18000	4.4203
4.1732	4.6626	19000	4.4012
4.1631	4.9081	20000	4.3828
4.0294	5.1534	21000	4.3887
4.0533	5.3988	22000	4.3801
4.0511	5.6442	23000	4.3681
4.0532	5.8897	24000	4.3559
3.9201	6.1350	25000	4.3686
3.9407	6.3804	26000	4.3653
3.9511	6.6258	27000	4.3558
3.9468	6.8712	28000	4.3467
3.8237	7.1166	29000	4.3628
3.8449	7.3620	30000	4.3622
3.8537	7.6074	31000	4.3554
3.8602	7.8528	32000	4.3491
3.7498	8.0982	33000	4.3658
3.7648	8.3436	34000	4.3675
3.7633	8.5890	35000	4.3641
3.7766	8.8344	36000	4.3592
3.6848	9.0798	37000	4.3705
3.6937	9.3252	38000	4.3738
3.6984	9.5706	39000	4.3721
3.7008	9.8160	40000	4.3701

Safetensors

Model size

0.2B params

Tensor type

F32