train_winogrande_42_1760637617

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the winogrande dataset. It achieves the following results on the evaluation set:

Loss: 0.0624
Num Input Tokens Seen: 38397712

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1334	1.0	9090	0.1537	1918960
0.0536	2.0	18180	0.0937	3839712
0.1154	3.0	27270	0.0785	5759216
0.0843	4.0	36360	0.0720	7678944
0.027	5.0	45450	0.0666	9598112
0.0937	6.0	54540	0.0648	11518608
0.0298	7.0	63630	0.0654	13438816
0.1602	8.0	72720	0.0671	15359200
0.0343	9.0	81810	0.0624	17280320
0.0127	10.0	90900	0.0682	19200384
0.0139	11.0	99990	0.0700	21120032
0.0841	12.0	109080	0.0704	23039856
0.008	13.0	118170	0.0751	24959536
0.0024	14.0	127260	0.0763	26879696
0.0023	15.0	136350	0.0800	28798160
0.0142	16.0	145440	0.0813	30718896
0.0054	17.0	154530	0.0824	32638160
0.0656	18.0	163620	0.0840	34558000
0.0764	19.0	172710	0.0848	36477680
0.038	20.0	181800	0.0833	38397712

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_winogrande_42_1760637617

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2394)

this model