train_hellaswag_123_1768397592

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the hellaswag dataset. It achieves the following results on the evaluation set:

Loss: 0.0957
Num Input Tokens Seen: 99752256

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1492	0.5000	8979	0.1932	4982272
0.1247	1.0001	17958	0.1141	9982144
0.0003	1.5001	26937	0.1143	14966208
0.1381	2.0001	35916	0.1120	19955568
0.057	2.5001	44895	0.1184	24935680
0.0005	3.0002	53874	0.0957	29935424
0.2342	3.5002	62853	0.1080	34914400
0.1786	4.0002	71832	0.1074	39901200
0.0001	4.5003	80811	0.1236	44885728
0.0002	5.0003	89790	0.1097	49884080
0.0003	5.5003	98769	0.1193	54873456
0.0001	6.0003	107748	0.1186	59861808
0.001	6.5004	116727	0.1335	64855584
0.0	7.0004	125706	0.1356	69848320
0.0	7.5004	134685	0.1572	74839776
0.2546	8.0004	143664	0.1441	79831776
0.0001	8.5005	152643	0.1576	84820208
0.0	9.0005	161622	0.1536	89803776
0.0001	9.5005	170601	0.1575	94782720

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.1+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 2

Model tree for rbelanec/train_hellaswag_123_1768397592

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2369)

this model