You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

mixtral_5_6gpu_new_settings_h100

This model is a fine-tuned version of on the arrow dataset. It achieves the following results on the evaluation set:

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 0.0001
train_batch_size: 8
eval_batch_size: 8
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 16
total_train_batch_size: 256
total_eval_batch_size: 16
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-06 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 500
training_steps: 40746
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
No log	0	0	10.9681
7.3907	0.1636	1000	7.1875
5.9337	0.3272	2000	5.8414
5.3958	0.4908	3000	5.3494
5.1307	0.6545	4000	5.1029
4.9703	0.8181	5000	4.9405
4.8382	0.9817	6000	4.8176
4.6294	1.1453	7000	4.7346
4.58	1.3089	8000	4.6590
4.5336	1.4725	9000	4.5953
4.4704	1.6361	10000	4.5386
4.4342	1.7997	11000	4.4860
4.3908	1.9634	12000	4.4411
4.1105	2.1270	13000	4.4373
4.1026	2.2906	14000	4.4111
4.0997	2.4542	15000	4.3853
4.0912	2.6178	16000	4.3593
4.0791	2.7814	17000	4.3366
4.0683	2.9450	18000	4.3135
3.6831	3.1086	19000	4.3660
3.7023	3.2723	20000	4.3685
3.7204	3.4359	21000	4.3607
3.7164	3.5995	22000	4.3513
3.723	3.7631	23000	4.3397
3.7212	3.9267	24000	4.3282
3.2371	4.0903	25000	4.4193
3.2627	4.2539	26000	4.4511
3.2844	4.4175	27000	4.4612
3.2914	4.5812	28000	4.4673
3.3027	4.7448	29000	4.4678
3.3098	4.9084	30000	4.4663
2.7994	5.0720	31000	4.5715
2.8213	5.2356	32000	4.6152
2.8343	5.3992	33000	4.6370
2.8467	5.5628	34000	4.6507
2.8512	5.7264	35000	4.6595
2.8536	5.8901	36000	4.6647
2.4456	6.0537	37000	4.7375
2.4629	6.2173	38000	4.7686
2.4635	6.3809	39000	4.7827
2.4631	6.5445	40000	4.7894

Safetensors

Model size

0.9B params

Tensor type

F32