train_sst2_1752763923

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the sst2 dataset. It achieves the following results on the evaluation set:

Loss: 0.1764
Num Input Tokens Seen: 37274384

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.4867	0.5001	3789	0.4814	1865280
0.3415	1.0001	7578	0.3408	3725296
0.3341	1.5002	11367	0.2871	5592496
0.2259	2.0003	15156	0.2488	7451728
0.2091	2.5003	18945	0.2320	9312720
0.2085	3.0004	22734	0.2201	11179712
0.149	3.5005	26523	0.2128	13045888
0.1517	4.0005	30312	0.2055	14911072
0.2147	4.5006	34101	0.1990	16773344
0.1958	5.0007	37890	0.1934	18639616
0.2437	5.5007	41679	0.1902	20500224
0.2142	6.0008	45468	0.1873	22362704
0.1721	6.5009	49257	0.1836	24225104
0.1674	7.0009	53046	0.1807	26097984
0.1549	7.5010	56835	0.1792	27958656
0.2901	8.0011	60624	0.1775	29826656
0.1052	8.5011	64413	0.1771	31694624
0.2088	9.0012	68202	0.1767	33553312
0.2619	9.5013	71991	0.1764	35412768

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.7.1+cu126
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: 3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_sst2_1752763923

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model