train_sst2_456_1760637848

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

Loss: 0.0566
Num Input Tokens Seen: 67744848

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.0375	1.0	15154	0.0886	3391536
0.1115	2.0	30308	0.0646	6778448
0.0881	3.0	45462	0.0593	10165856
0.0959	4.0	60616	0.0603	13553232
0.1365	5.0	75770	0.0606	16942544
0.0097	6.0	90924	0.0571	20329264
0.0219	7.0	106078	0.0598	23711968
0.0538	8.0	121232	0.0574	27102656
0.1191	9.0	136386	0.0566	30492336
0.0207	10.0	151540	0.0591	33879088
0.007	11.0	166694	0.0575	37261968
0.0028	12.0	181848	0.0592	40650384
0.0033	13.0	197002	0.0630	44037328
0.0231	14.0	212156	0.0605	47424688
0.0484	15.0	227310	0.0607	50810944
0.0144	16.0	242464	0.0611	54197216
0.0161	17.0	257618	0.0614	57585488
0.0348	18.0	272772	0.0615	60973008
0.0721	19.0	287926	0.0615	64361136
0.0052	20.0	303080	0.0614	67744848

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_sst2_456_1760637848

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model