train_sst2_123_1760637738

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

Loss: 0.0631
Num Input Tokens Seen: 67743008

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.0652	1.0	15154	0.0929	3385616
0.1307	2.0	30308	0.0749	6774096
0.0099	3.0	45462	0.0684	10161824
0.1462	4.0	60616	0.0664	13549104
0.0063	5.0	75770	0.0649	16935568
0.05	6.0	90924	0.0645	20320896
0.0978	7.0	106078	0.0637	23709008
0.0072	8.0	121232	0.0631	27099520
0.0606	9.0	136386	0.0633	30484864
0.0272	10.0	151540	0.0645	33869824
0.0154	11.0	166694	0.0642	37256608
0.0518	12.0	181848	0.0644	40640592
0.0079	13.0	197002	0.0651	44027424
0.0185	14.0	212156	0.0668	47415744
0.027	15.0	227310	0.0659	50803600
0.1355	16.0	242464	0.0666	54189408
0.0332	17.0	257618	0.0664	57578368
0.0562	18.0	272772	0.0671	60965904
0.1443	19.0	287926	0.0669	64353008
0.002	20.0	303080	0.0670	67743008

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_sst2_123_1760637738

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model