train_winogrande_123_1760637731

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the winogrande dataset. It achieves the following results on the evaluation set:

Loss: 6.6467
Num Input Tokens Seen: 38394016

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
7.1427	1.0	9090	7.1531	1918144
6.8385	2.0	18180	6.7278	3838192
6.5905	3.0	27270	6.6631	5757648
6.8395	4.0	36360	6.6514	7676976
6.5852	5.0	45450	6.6536	9596496
6.6834	6.0	54540	6.6556	11516256
7.2351	7.0	63630	6.6681	13435600
6.7346	8.0	72720	6.6502	15356752
6.8398	9.0	81810	6.6523	17276752
6.5886	10.0	90900	6.6496	19196064
6.6816	11.0	99990	6.6496	21115472
6.5914	12.0	109080	6.6467	23035440
6.5562	13.0	118170	6.6496	24955600
6.4908	14.0	127260	6.6496	26875344
6.6743	15.0	136350	6.6496	28795600
6.7742	16.0	145440	6.6496	30715008
6.5507	17.0	154530	6.6496	32634912
6.4493	18.0	163620	6.6496	34554080
6.7687	19.0	172710	6.6496	36472448
6.684	20.0	181800	6.6496	38394016

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_winogrande_123_1760637731

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2394)

this model