train_hellaswag_123_1760637742

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the hellaswag dataset. It achieves the following results on the evaluation set:

Loss: 0.0586
Num Input Tokens Seen: 218506144

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1382	1.0	8979	0.0586	10932896
0.0004	2.0	17958	0.0653	21856400
0.0015	3.0	26937	0.0697	32797696
0.0002	4.0	35916	0.0844	43715520
0.0001	5.0	44895	0.0924	54639040
0.0001	6.0	53874	0.1024	65562352
0.0	7.0	62853	0.1465	76495264
0.0	8.0	71832	0.1089	87424000
0.0009	9.0	80811	0.1000	98355744
0.0001	10.0	89790	0.1043	109279616
0.0	11.0	98769	0.1221	120190896
0.0	12.0	107748	0.1171	131118336
0.0	13.0	116727	0.1245	142033584
0.0	14.0	125706	0.1783	152960704
0.0	15.0	134685	0.1853	163884192
0.0	16.0	143664	0.1957	174816592
0.0	17.0	152643	0.1970	185740864
0.0	18.0	161622	0.2032	196657440
0.0	19.0	170601	0.2059	207581424
0.0	20.0	179580	0.2064	218506144

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_hellaswag_123_1760637742

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2391)

this model