train_sst2_789_1760637966

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

Loss: 0.0604
Num Input Tokens Seen: 67736640

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1534	1.0	15154	0.0902	3386112
0.0187	2.0	30308	0.0714	6772240
0.026	3.0	45462	0.0645	10157776
0.0077	4.0	60616	0.0624	13544064
0.0369	5.0	75770	0.0619	16932736
0.0302	6.0	90924	0.0608	20317792
0.0747	7.0	106078	0.0618	23704608
0.0115	8.0	121232	0.0604	27091680
0.1018	9.0	136386	0.0635	30478800
0.0569	10.0	151540	0.0614	33864736
0.0607	11.0	166694	0.0623	37250256
0.0023	12.0	181848	0.0627	40636544
0.1343	13.0	197002	0.0638	44025456
0.0789	14.0	212156	0.0624	47407856
0.0208	15.0	227310	0.0641	50794368
0.0571	16.0	242464	0.0639	54183904
0.0365	17.0	257618	0.0643	57571280
0.0073	18.0	272772	0.0643	60960480
0.0996	19.0	287926	0.0642	64347696
0.0032	20.0	303080	0.0643	67736640

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for rbelanec/train_sst2_789_1760637966

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2187)

this model