train_sst2_789_1760637962

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

Loss: 0.0559
Num Input Tokens Seen: 67736640

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3456	1.0	15154	0.3410	3386112
0.0128	2.0	30308	0.0674	6772240
0.0664	3.0	45462	0.0585	10157776
0.0085	4.0	60616	0.0584	13544064
0.016	5.0	75770	0.0572	16932736
0.016	6.0	90924	0.0576	20317792
0.1063	7.0	106078	0.0569	23704608
0.0522	8.0	121232	0.0591	27091680
0.1168	9.0	136386	0.0560	30478800
0.0764	10.0	151540	0.0559	33864736
0.0529	11.0	166694	0.0569	37250256
0.0034	12.0	181848	0.0565	40636544
0.1384	13.0	197002	0.0573	44025456
0.0985	14.0	212156	0.0566	47407856
0.0278	15.0	227310	0.0588	50794368
0.0634	16.0	242464	0.0585	54183904
0.0322	17.0	257618	0.0586	57571280
0.035	18.0	272772	0.0585	60960480
0.0613	19.0	287926	0.0585	64347696
0.0152	20.0	303080	0.0586	67736640

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 2

Model tree for rbelanec/train_sst2_789_1760637962

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model