train_mnli_1754652132

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mnli dataset. It achieves the following results on the evaluation set:

Loss: 0.2827
Num Input Tokens Seen: 347859920

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3408	0.5	44179	0.3464	17403808
0.3304	1.0	88358	0.3557	34786008
0.3219	1.5	132537	0.3367	52165240
0.3141	2.0	176716	0.3337	69564424
0.3159	2.5	220895	0.3378	86951080
0.2659	3.0	265074	0.3170	104352808
0.3612	3.5	309253	0.3060	121746504
0.2903	4.0	353432	0.3037	139123792
0.247	4.5	397611	0.2995	156526672
0.2781	5.0	441790	0.2941	173916408
0.337	5.5	485969	0.2924	191309592
0.2202	6.0	530148	0.2921	208701328
0.3108	6.5	574327	0.2903	226098768
0.2501	7.0	618506	0.2863	243493272
0.2648	7.5	662685	0.2848	260881240
0.2872	8.0	706864	0.2836	278276232
0.2503	8.5	751043	0.2834	295687496
0.2881	9.0	795222	0.2829	313062872
0.3381	9.5	839401	0.2827	330444056
0.2779	10.0	883580	0.2828	347859920

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.8.0+cu128
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_mnli_1754652132

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2404)

this model