train_wsc_789_1760637885

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Loss: 0.3555
Num Input Tokens Seen: 976592

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.4065	1.0	125	0.6427	48896
0.4968	2.0	250	0.3884	97760
0.4411	3.0	375	0.3695	146816
0.4123	4.0	500	0.3634	195376
0.353	5.0	625	0.3580	244368
0.3876	6.0	750	0.3629	293088
0.3765	7.0	875	0.3609	341856
0.3445	8.0	1000	0.3579	390544
0.3626	9.0	1125	0.3575	439264
0.3633	10.0	1250	0.3573	487904
0.3418	11.0	1375	0.3568	536960
0.355	12.0	1500	0.3559	585712
0.35	13.0	1625	0.3597	634464
0.3156	14.0	1750	0.3577	682800
0.3448	15.0	1875	0.3575	731376
0.3442	16.0	2000	0.3593	779936
0.3741	17.0	2125	0.3555	828880
0.3578	18.0	2250	0.3555	877920
0.3391	19.0	2375	0.3587	927488
0.3329	20.0	2500	0.3578	976592

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 4

Model tree for rbelanec/train_wsc_789_1760637885

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2394)

this model