train_wsc_456_1768397601

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Loss: 0.3353
Num Input Tokens Seen: 434784

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.5806	0.5020	125	0.3903	21856
0.3905	1.0040	250	0.3492	43616
0.3177	1.5060	375	0.3473	65728
0.3597	2.0080	500	0.3353	87264
0.3412	2.5100	625	0.3670	108800
0.3437	3.0120	750	0.3367	130752
0.3563	3.5141	875	0.3360	153248
0.3521	4.0161	1000	0.3649	174560
0.3514	4.5181	1125	0.3627	196896
0.3449	5.0201	1250	0.3473	218384
0.3636	5.5221	1375	0.3436	239584
0.3541	6.0241	1500	0.3560	261760
0.3572	6.5261	1625	0.3578	283744
0.4247	7.0281	1750	0.3530	305552
0.3511	7.5301	1875	0.3504	326736
0.3561	8.0321	2000	0.3551	349328
0.3485	8.5341	2125	0.3589	371040
0.3365	9.0361	2250	0.3598	392704
0.3674	9.5382	2375	0.3632	414608

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.1+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 2

Model tree for rbelanec/train_wsc_456_1768397601

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2398)

this model