train_sst2_456_1760637852

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

Loss: 0.0623
Num Input Tokens Seen: 67744848

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.0857	1.0	15154	0.0893	3391536
0.0494	2.0	30308	0.0759	6778448
0.1631	3.0	45462	0.0685	10165856
0.1199	4.0	60616	0.0667	13553232
0.1304	5.0	75770	0.0647	16942544
0.0131	6.0	90924	0.0644	20329264
0.007	7.0	106078	0.0641	23711968
0.0761	8.0	121232	0.0632	27102656
0.1189	9.0	136386	0.0629	30492336
0.1355	10.0	151540	0.0623	33879088
0.0034	11.0	166694	0.0628	37261968
0.005	12.0	181848	0.0632	40650384
0.0041	13.0	197002	0.0632	44037328
0.0081	14.0	212156	0.0636	47424688
0.0489	15.0	227310	0.0632	50810944
0.0035	16.0	242464	0.0631	54197216
0.0065	17.0	257618	0.0638	57585488
0.0249	18.0	272772	0.0637	60973008
0.0256	19.0	287926	0.0640	64361136
0.0075	20.0	303080	0.0638	67744848

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 2

Model tree for rbelanec/train_sst2_456_1760637852

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model