train_wsc_789_1760637882

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Loss: 0.3448
Num Input Tokens Seen: 976592

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3477	1.0	125	0.3642	48896
0.3411	2.0	250	0.4001	97760
0.3957	3.0	375	0.3936	146816
0.3581	4.0	500	0.3551	195376
0.3515	5.0	625	0.3513	244368
0.3545	6.0	750	0.3448	293088
0.3494	7.0	875	0.3601	341856
0.3488	8.0	1000	0.3758	390544
0.3631	9.0	1125	0.3921	439264
0.3234	10.0	1250	0.4506	487904
0.3089	11.0	1375	0.5822	536960
0.3719	12.0	1500	1.1028	585712
0.1117	13.0	1625	1.5362	634464
0.2822	14.0	1750	2.0149	682800
0.2706	15.0	1875	2.1991	731376
0.1432	16.0	2000	2.5393	779936
0.2047	17.0	2125	2.7740	828880
0.0081	18.0	2250	2.9044	877920
0.1336	19.0	2375	2.9303	927488
0.0006	20.0	2500	2.9361	976592

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 2

Model tree for rbelanec/train_wsc_789_1760637882

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model