train_wsc_456_1760637769

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Loss: 0.6234
Num Input Tokens Seen: 970208

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.5337	1.0	125	0.6900	48240
0.7393	2.0	250	0.6805	96896
0.5863	3.0	375	0.6551	145184
0.6055	4.0	500	0.6428	194384
0.6644	5.0	625	0.6511	242624
0.4849	6.0	750	0.6406	291216
0.6976	7.0	875	0.6307	339568
0.7129	8.0	1000	0.6311	388576
0.5988	9.0	1125	0.6273	436656
0.5223	10.0	1250	0.6278	485152
0.4151	11.0	1375	0.6315	533200
0.5423	12.0	1500	0.6345	581792
0.8036	13.0	1625	0.6269	630384
0.6415	14.0	1750	0.6234	678480
0.4146	15.0	1875	0.6297	727056
0.4333	16.0	2000	0.6285	775168
0.704	17.0	2125	0.6306	824240
0.5489	18.0	2250	0.6307	872896
0.534	19.0	2375	0.6379	921296
0.6336	20.0	2500	0.6296	970208

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_wsc_456_1760637769

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model