train_winogrande_456_1760637846

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the winogrande dataset. It achieves the following results on the evaluation set:

Loss: 0.0648
Num Input Tokens Seen: 38395408

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1968	1.0	9090	0.1597	1919808
0.0788	2.0	18180	0.0975	3839104
0.0772	3.0	27270	0.0804	5758016
0.0358	4.0	36360	0.0748	7678560
0.0224	5.0	45450	0.0699	9598912
0.1415	6.0	54540	0.0685	11518656
0.0745	7.0	63630	0.0664	13438320
0.0214	8.0	72720	0.0658	15358064
0.1006	9.0	81810	0.0648	17278064
0.2738	10.0	90900	0.0699	19196144
0.0162	11.0	99990	0.0696	21117200
0.0013	12.0	109080	0.0718	23037584
0.0559	13.0	118170	0.0793	24956720
0.0916	14.0	127260	0.0801	26875344
0.0007	15.0	136350	0.0828	28793344
0.0226	16.0	145440	0.0860	30713568
0.0427	17.0	154530	0.0895	32635088
0.0093	18.0	163620	0.0888	34555376
0.1062	19.0	172710	0.0889	36474544
0.0049	20.0	181800	0.0879	38395408

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 2

Model tree for rbelanec/train_winogrande_456_1760637846

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2394)

this model