train_wsc_123_1760637654

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Loss: 0.5301
Num Input Tokens Seen: 977568

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.5972	1.0	125	0.5531	49376
0.4086	2.0	250	0.5551	98240
0.5509	3.0	375	0.5435	147648
0.5025	4.0	500	0.5480	197024
0.6751	5.0	625	0.5442	245472
0.6188	6.0	750	0.5384	293616
0.6894	7.0	875	0.5318	343040
0.7383	8.0	1000	0.5344	392080
0.7217	9.0	1125	0.5357	440848
0.5629	10.0	1250	0.5319	490000
0.4892	11.0	1375	0.5340	538944
0.3806	12.0	1500	0.5332	587536
0.6089	13.0	1625	0.5344	636208
0.4514	14.0	1750	0.5356	685120
0.7416	15.0	1875	0.5301	734352
0.5128	16.0	2000	0.5323	782368
0.7102	17.0	2125	0.5388	831888
0.972	18.0	2250	0.5319	880112
0.4493	19.0	2375	0.5365	928992
0.6916	20.0	2500	0.5365	977568

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for rbelanec/train_wsc_123_1760637654

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2188)

this model