train_wsc_101112_1760637994

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Loss: 0.4448
Num Input Tokens Seen: 980224

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 4
eval_batch_size: 4
seed: 101112
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.359	1.0	125	0.3505	48944
0.3577	2.0	250	0.3474	98080
0.3573	3.0	375	0.4423	146624
0.4117	4.0	500	0.3616	196192
0.3537	5.0	625	0.3593	245216
0.3432	6.0	750	0.3444	294128
0.359	7.0	875	0.3507	342416
0.3432	8.0	1000	0.3508	391552
0.3542	9.0	1125	0.3508	440848
0.3538	10.0	1250	0.3524	488816
0.343	11.0	1375	0.3504	537840
0.343	12.0	1500	0.3526	586624
0.3434	13.0	1625	0.3593	635776
0.3304	14.0	1750	0.3679	684416
0.3452	15.0	1875	0.3741	733488
0.3542	16.0	2000	0.3641	782688
0.349	17.0	2125	0.3981	831792
0.3058	18.0	2250	0.4066	881328
0.3421	19.0	2375	0.4308	930656
0.3058	20.0	2500	0.4317	980224

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for rbelanec/train_wsc_101112_1760637994

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2188)

this model