train_mnli_1755694486

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the mnli dataset. It achieves the following results on the evaluation set:

Loss: 0.2973
Num Input Tokens Seen: 312972112

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.0454	0.5	88358	0.1573	15656400
0.0722	1.0	176716	0.1077	31302832
0.3265	1.5	265074	0.3248	46945024
0.3418	2.0	353432	0.3202	62598968
0.2511	2.5	441790	0.3285	78243624
0.2476	3.0	530148	0.3304	93900752
0.1914	3.5	618506	0.3092	109555344
0.2905	4.0	706864	0.3047	125196704
0.2523	4.5	795222	0.3154	140844896
0.2946	5.0	883580	0.3051	156493064
0.2024	5.5	971938	0.3068	172140360
0.3655	6.0	1060296	0.3063	187789496
0.3655	6.5	1148654	0.3115	203440440
0.2426	7.0	1237012	0.3030	219083952
0.2702	7.5	1325370	0.3020	234732208
0.2653	8.0	1413728	0.2988	250382016
0.297	8.5	1502086	0.2984	266047408
0.2656	9.0	1590444	0.2974	281673536
0.3246	9.5	1678802	0.2975	297311136
0.2602	10.0	1767160	0.2973	312972112

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.8.0+cu128
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: 1

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_mnli_1755694486

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model