train_wsc_101112_1760373109

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Loss: 1.0793
Num Input Tokens Seen: 1471184

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 4
eval_batch_size: 4
seed: 101112
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 30

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3521	1.504	188	0.4224	74288
0.3545	3.008	376	0.3610	147040
0.3617	4.5120	564	0.3474	221408
0.3523	6.016	752	0.3498	294736
0.3773	7.52	940	0.3696	368400
0.3547	9.024	1128	0.3556	441968
0.3446	10.528	1316	0.3625	514960
0.3445	12.032	1504	0.3630	588032
0.3342	13.536	1692	0.3680	662784
0.3396	15.04	1880	0.3744	735760
0.3547	16.544	2068	0.3644	809088
0.3341	18.048	2256	0.3969	883568
0.3488	19.552	2444	0.4179	958720
0.2865	21.056	2632	0.4510	1031776
0.2557	22.56	2820	0.5495	1105632
0.2666	24.064	3008	0.7044	1179856
0.2655	25.568	3196	0.8358	1253280
0.1862	27.072	3384	0.9530	1327824
0.2195	28.576	3572	1.0787	1400944

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.8.0+cu128
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wsc_101112_1760373109

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2401)

this model