train_winogrande_42_1760637615

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the winogrande dataset. It achieves the following results on the evaluation set:

Loss: 0.0459
Num Input Tokens Seen: 38397712

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.0822	1.0	9090	0.0800	1918960
0.097	2.0	18180	0.0459	3839712
0.068	3.0	27270	0.0567	5759216
0.0531	4.0	36360	0.0683	7678944
0.0001	5.0	45450	0.0838	9598112
0.1417	6.0	54540	0.0935	11518608
0.0	7.0	63630	0.0885	13438816
0.0007	8.0	72720	0.0733	15359200
0.0	9.0	81810	0.1085	17280320
0.0	10.0	90900	0.0950	19200384
0.0011	11.0	99990	0.0932	21120032
0.0	12.0	109080	0.0910	23039856
0.0	13.0	118170	0.1305	24959536
0.0	14.0	127260	0.1208	26879696
0.0	15.0	136350	0.1393	28798160
0.0	16.0	145440	0.1550	30718896
0.0	17.0	154530	0.1616	32638160
0.0	18.0	163620	0.1718	34558000
0.0	19.0	172710	0.1722	36477680
0.0	20.0	181800	0.1724	38397712

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for rbelanec/train_winogrande_42_1760637615

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2394)

this model