train_sst2_42_1760637622

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

Loss: 0.8224
Num Input Tokens Seen: 67768656

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.8503	1.0	15154	0.8988	3383904
0.7166	2.0	30308	0.8461	6774480
0.7149	3.0	45462	0.8269	10163152
0.8593	4.0	60616	0.8305	13552544
0.7253	5.0	75770	0.8279	16941120
0.7993	6.0	90924	0.8240	20331520
0.6568	7.0	106078	0.8246	23721840
0.5681	8.0	121232	0.8233	27111168
0.8977	9.0	136386	0.8304	30498912
0.9404	10.0	151540	0.8224	33886240
0.7244	11.0	166694	0.8290	37277536
0.6772	12.0	181848	0.8292	40664576
0.9906	13.0	197002	0.8227	44053792
0.6141	14.0	212156	0.8283	47441984
0.8149	15.0	227310	0.8283	50830480
0.8214	16.0	242464	0.8283	54215696
0.7346	17.0	257618	0.8283	57603296
0.8047	18.0	272772	0.8283	60987568
0.9349	19.0	287926	0.8283	64378192
0.604	20.0	303080	0.8283	67768656

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_sst2_42_1760637622

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model