train_wsc_101112_1760637996

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Loss: 0.3487
Num Input Tokens Seen: 980224

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 101112
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3582	1.0	125	0.3539	48944
0.3572	2.0	250	0.3494	98080
0.3497	3.0	375	0.3487	146624
0.5069	4.0	500	0.3682	196192
0.3625	5.0	625	0.3678	245216
0.4367	6.0	750	0.3667	294128
0.3587	7.0	875	0.3624	342416
0.3383	8.0	1000	0.3683	391552
0.3744	9.0	1125	0.3944	440848
0.3203	10.0	1250	0.6999	488816
0.3141	11.0	1375	0.6510	537840
0.3207	12.0	1500	1.2558	586624
0.4775	13.0	1625	1.5535	635776
0.1342	14.0	1750	1.7732	684416
0.2525	15.0	1875	2.2903	733488
0.1895	16.0	2000	2.8244	782688
0.1377	17.0	2125	3.0176	831792
0.0095	18.0	2250	3.2261	881328
0.1878	19.0	2375	3.2861	930656
0.1642	20.0	2500	3.3436	980224

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_wsc_101112_1760637996

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2394)

this model