train_sst2_1754652138

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

Loss: 1.1318
Num Input Tokens Seen: 33869824

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.0255	0.5	7577	0.0898	1694048
0.038	1.0	15154	0.0722	3385616
0.2065	1.5	22731	0.0617	5082864
0.1293	2.0	30308	0.0609	6774096
0.0045	2.5	37885	0.0607	8467152
0.0071	3.0	45462	0.0596	10161824
0.0268	3.5	53039	0.0624	11856000
0.1264	4.0	60616	0.0588	13549104
0.0303	4.5	68193	0.0586	15241168
0.0083	5.0	75770	0.0581	16935568
0.0731	5.5	83347	0.0597	18626160
0.0214	6.0	90924	0.0609	20320896
0.0863	6.5	98501	0.0602	22013696
0.0855	7.0	106078	0.0599	23709008
0.0718	7.5	113655	0.0664	25400400
0.0024	8.0	121232	0.0654	27099520
0.0028	8.5	128809	0.0647	28792480
0.027	9.0	136386	0.0670	30484864
0.0365	9.5	143963	0.0673	32173664
0.0098	10.0	151540	0.0675	33869824

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.8.0+cu128
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_sst2_1754652138

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2400)

this model