train_sst2_456_1760637850

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

Loss: 0.0640
Num Input Tokens Seen: 67744848

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1085	1.0	15154	0.0772	3391536
0.1464	2.0	30308	0.0640	6778448
0.0762	3.0	45462	0.0713	10165856
0.1118	4.0	60616	0.0748	13553232
0.0675	5.0	75770	0.1174	16942544
0.0001	6.0	90924	0.1612	20329264
0.0	7.0	106078	0.1695	23711968
0.0008	8.0	121232	0.1890	27102656
0.0003	9.0	136386	0.1944	30492336
0.0	10.0	151540	0.2490	33879088
0.0	11.0	166694	0.2101	37261968
0.0	12.0	181848	0.2487	40650384
0.0	13.0	197002	0.3418	44037328
0.0	14.0	212156	0.2726	47424688
0.0	15.0	227310	0.3367	50810944
0.0	16.0	242464	0.3564	54197216
0.0	17.0	257618	0.3742	57585488
0.0	18.0	272772	0.4022	60973008
0.0	19.0	287926	0.4067	64361136
0.0	20.0	303080	0.4075	67744848

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_sst2_456_1760637850

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model