train_wsc_123_1760637652

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Loss: 6.9383
Num Input Tokens Seen: 977568

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3549	1.0	125	0.4709	49376
0.343	2.0	250	0.3660	98240
0.3706	3.0	375	0.3483	147648
0.3322	4.0	500	0.3553	197024
0.3417	5.0	625	0.3575	245472
0.3543	6.0	750	0.3492	293616
0.3127	7.0	875	0.3661	343040
0.346	8.0	1000	0.3488	392080
0.326	9.0	1125	0.3519	440848
0.3382	10.0	1250	0.3510	490000
0.3372	11.0	1375	0.3552	538944
0.3673	12.0	1500	0.3610	587536
0.3505	13.0	1625	0.3524	636208
0.3504	14.0	1750	0.3494	685120
0.34	15.0	1875	0.3535	734352
0.3549	16.0	2000	0.3510	782368
0.3345	17.0	2125	0.3498	831888
0.3504	18.0	2250	0.3540	880112
0.3479	19.0	2375	0.3544	928992
0.3351	20.0	2500	0.3543	977568

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_wsc_123_1760637652

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model