train_sst2_456_1760637849

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

Loss: 2.7879
Num Input Tokens Seen: 67744848

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.0673	1.0	15154	0.0713	3391536
0.1176	2.0	30308	0.0667	6778448
0.0827	3.0	45462	0.0571	10165856
0.0999	4.0	60616	0.0557	13553232
0.1547	5.0	75770	0.0542	16942544
0.0132	6.0	90924	0.0543	20329264
0.015	7.0	106078	0.0574	23711968
0.0524	8.0	121232	0.0537	27102656
0.1257	9.0	136386	0.0564	30492336
0.0165	10.0	151540	0.0561	33879088
0.0064	11.0	166694	0.0591	37261968
0.0019	12.0	181848	0.0626	40650384
0.0042	13.0	197002	0.0682	44037328
0.0044	14.0	212156	0.0683	47424688
0.0074	15.0	227310	0.0731	50810944
0.0016	16.0	242464	0.0811	54197216
0.001	17.0	257618	0.0862	57585488
0.0042	18.0	272772	0.0879	60973008
0.0066	19.0	287926	0.0928	64361136
0.0007	20.0	303080	0.0933	67744848

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_sst2_456_1760637849

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model