train_hellaswag_789_1760637969

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the hellaswag dataset. It achieves the following results on the evaluation set:

Loss: 2.2132
Num Input Tokens Seen: 218389184

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.0442	1.0	8979	0.0831	10925792
0.003	2.0	17958	0.0693	21844800
0.0403	3.0	26937	0.0661	32755216
0.1296	4.0	35916	0.0586	43680624
0.036	5.0	44895	0.0616	54591088
0.0892	6.0	53874	0.0596	65516928
0.0221	7.0	62853	0.0628	76450992
0.0209	8.0	71832	0.0552	87373568
0.0276	9.0	80811	0.0595	98295904
0.0657	10.0	89790	0.0636	109213952
0.0008	11.0	98769	0.0762	120131232
0.0051	12.0	107748	0.0797	131040032
0.0005	13.0	116727	0.0977	141960608
0.0001	14.0	125706	0.1072	152870816
0.0002	15.0	134685	0.1132	163783840
0.0002	16.0	143664	0.1273	174703760
0.0002	17.0	152643	0.1281	185627024
0.0003	18.0	161622	0.1255	196546368
0.0002	19.0	170601	0.1294	207466096
0.0004	20.0	179580	0.1303	218389184

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 7

Model tree for rbelanec/train_hellaswag_789_1760637969

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2391)

this model