train_wsc_123_1760637655

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Loss: 0.3634
Num Input Tokens Seen: 977568

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.5975	1.0	125	0.5597	49376
0.2925	2.0	250	0.4598	98240
0.355	3.0	375	0.4023	147648
0.3408	4.0	500	0.3779	197024
0.3269	5.0	625	0.3751	245472
0.3672	6.0	750	0.3678	293616
0.3265	7.0	875	0.3816	343040
0.3596	8.0	1000	0.3671	392080
0.3372	9.0	1125	0.3711	440848
0.331	10.0	1250	0.3698	490000
0.3279	11.0	1375	0.3697	538944
0.3436	12.0	1500	0.3703	587536
0.3575	13.0	1625	0.3698	636208
0.3652	14.0	1750	0.3717	685120
0.3541	15.0	1875	0.3664	734352
0.3581	16.0	2000	0.3687	782368
0.3276	17.0	2125	0.3659	831888
0.386	18.0	2250	0.3668	880112
0.3259	19.0	2375	0.3634	928992
0.3358	20.0	2500	0.3654	977568

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 2

Model tree for rbelanec/train_wsc_123_1760637655

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model