train_wsc_456_1760360293

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Loss: 0.3281
Num Input Tokens Seen: 1457072

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 30

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3448	1.504	188	0.3321	73040
0.3337	3.008	376	0.3281	145504
0.359	4.5120	564	0.3586	219728
0.3448	6.016	752	0.3437	291856
0.3559	7.52	940	0.3506	364400
0.3635	9.024	1128	0.3401	437840
0.3465	10.528	1316	0.3494	510976
0.3506	12.032	1504	0.3492	583328
0.3481	13.536	1692	0.3376	655600
0.3556	15.04	1880	0.3573	728944
0.3495	16.544	2068	0.3337	801728
0.3499	18.048	2256	0.3402	875104
0.3477	19.552	2444	0.3467	948912
0.3403	21.056	2632	0.3410	1021088
0.3391	22.56	2820	0.3406	1093760
0.3436	24.064	3008	0.3405	1167376
0.331	25.568	3196	0.3400	1241056
0.3513	27.072	3384	0.3404	1314832
0.3608	28.576	3572	0.3432	1388480

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.8.0+cu128
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_456_1760360293

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2187)

this model