train_hellaswag_789_1760637967

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the hellaswag dataset. It achieves the following results on the evaluation set:

Loss: 0.6736
Num Input Tokens Seen: 194250048

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.4669	2.0	15962	0.4627	19418112
0.4684	4.0	31924	0.4626	38834048
0.4565	6.0	47886	0.4626	58283392
0.4551	8.0	63848	0.4662	77712192
0.4711	10.0	79810	0.4703	97147040
0.4456	12.0	95772	0.4814	116563520
0.4311	14.0	111734	0.5079	135973856
0.4883	16.0	127696	0.5750	155391456
0.2303	18.0	143658	0.6490	174811520
0.2854	20.0	159620	0.6736	194250048

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_hellaswag_789_1760637967

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2389)

this model