train_wsc_789_1760637884

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Loss: 0.5816
Num Input Tokens Seen: 976592

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.4207	1.0	125	0.6380	48896
0.5887	2.0	250	0.6308	97760
0.9575	3.0	375	0.6143	146816
0.5472	4.0	500	0.6036	195376
0.5809	5.0	625	0.6037	244368
0.761	6.0	750	0.5903	293088
0.6357	7.0	875	0.5836	341856
0.4514	8.0	1000	0.5884	390544
0.7934	9.0	1125	0.5911	439264
0.6754	10.0	1250	0.5825	487904
0.3472	11.0	1375	0.5829	536960
0.8128	12.0	1500	0.5905	585712
0.6479	13.0	1625	0.5839	634464
0.3784	14.0	1750	0.5882	682800
0.39	15.0	1875	0.5818	731376
0.5261	16.0	2000	0.5816	779936
0.6438	17.0	2125	0.5899	828880
0.5272	18.0	2250	0.5834	877920
0.5616	19.0	2375	0.5873	927488
0.5713	20.0	2500	0.5868	976592

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_wsc_789_1760637884

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2394)

this model