train_wsc_101112_1760637997

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Loss: 0.6402
Num Input Tokens Seen: 980224

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 101112
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.9665	1.0	125	0.6890	48944
0.5964	2.0	250	0.6836	98080
0.8274	3.0	375	0.6709	146624
0.8546	4.0	500	0.6651	196192
0.4953	5.0	625	0.6512	245216
0.6403	6.0	750	0.6573	294128
0.5437	7.0	875	0.6495	342416
0.6669	8.0	1000	0.6479	391552
0.6898	9.0	1125	0.6402	440848
0.5976	10.0	1250	0.6437	488816
0.8871	11.0	1375	0.6521	537840
0.5946	12.0	1500	0.6476	586624
0.559	13.0	1625	0.6445	635776
0.4213	14.0	1750	0.6499	684416
0.7426	15.0	1875	0.6513	733488
0.586	16.0	2000	0.6520	782688
0.7335	17.0	2125	0.6495	831792
0.5986	18.0	2250	0.6465	881328
0.6066	19.0	2375	0.6475	930656
0.6119	20.0	2500	0.6521	980224

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for rbelanec/train_wsc_101112_1760637997

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2187)

this model