train_sst2_123_1760637736

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

Loss: 0.0662
Num Input Tokens Seen: 67743008

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.0394	1.0	15154	0.0775	3385616
0.1653	2.0	30308	0.0662	6774096
0.017	3.0	45462	0.0811	10161824
0.1168	4.0	60616	0.0955	13549104
0.0003	5.0	75770	0.1061	16935568
0.1284	6.0	90924	0.1878	20320896
0.147	7.0	106078	0.1725	23709008
0.0	8.0	121232	0.1877	27099520
0.0001	9.0	136386	0.1514	30484864
0.0	10.0	151540	0.2051	33869824
0.0	11.0	166694	0.2066	37256608
0.0001	12.0	181848	0.1747	40640592
0.0	13.0	197002	0.2491	44027424
0.0158	14.0	212156	0.2506	47415744
0.0	15.0	227310	0.2808	50803600
0.0	16.0	242464	0.3553	54189408
0.0	17.0	257618	0.3760	57578368
0.0	18.0	272772	0.3955	60965904
0.0	19.0	287926	0.4011	64353008
0.0	20.0	303080	0.4019	67743008

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for rbelanec/train_sst2_123_1760637736

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2155)

this model

rbelanec
/

train_sst2_123_1760637736

train_sst2_123_1760637736

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for rbelanec/train_sst2_123_1760637736

Evaluation results