train_sst2_789_1760637963

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

Loss: 0.1405
Num Input Tokens Seen: 67736640

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.0462	1.0	15154	0.0622	3386112
0.0073	2.0	30308	0.0589	6772240
0.053	3.0	45462	0.0584	10157776
0.0038	4.0	60616	0.0590	13544064
0.0336	5.0	75770	0.0582	16932736
0.0495	6.0	90924	0.0573	20317792
0.1296	7.0	106078	0.0609	23704608
0.0074	8.0	121232	0.0594	27091680
0.1052	9.0	136386	0.0626	30478800
0.076	10.0	151540	0.0622	33864736
0.0051	11.0	166694	0.0662	37250256
0.0007	12.0	181848	0.0686	40636544
0.1467	13.0	197002	0.0768	44025456
0.0389	14.0	212156	0.0783	47407856
0.0045	15.0	227310	0.0875	50794368
0.1181	16.0	242464	0.0990	54183904
0.0012	17.0	257618	0.1037	57571280
0.0008	18.0	272772	0.1107	60960480
0.001	19.0	287926	0.1143	64347696
0.001	20.0	303080	0.1158	67736640

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_sst2_789_1760637963

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model