train_wsc_789_1760637880

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Loss: 0.3484
Num Input Tokens Seen: 976592

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
7.7829	1.0	125	5.3146	48896
0.4495	2.0	250	0.7327	97760
0.354	3.0	375	0.4015	146816
0.3678	4.0	500	0.3694	195376
0.4158	5.0	625	0.4583	244368
0.3997	6.0	750	0.3579	293088
0.3524	7.0	875	0.3502	341856
0.3817	8.0	1000	0.3707	390544
0.3803	9.0	1125	0.3625	439264
0.3775	10.0	1250	0.3508	487904
0.369	11.0	1375	0.3764	536960
0.3614	12.0	1500	0.3484	585712
0.3603	13.0	1625	0.3600	634464
0.3783	14.0	1750	0.3639	682800
0.3633	15.0	1875	0.3538	731376
0.3462	16.0	2000	0.3501	779936
0.3554	17.0	2125	0.3579	828880
0.3616	18.0	2250	0.3541	877920
0.3537	19.0	2375	0.3561	927488
0.3401	20.0	2500	0.3537	976592

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 5

Model tree for rbelanec/train_wsc_789_1760637880

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2105)

this model

rbelanec
/

train_wsc_789_1760637880

train_wsc_789_1760637880

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for rbelanec/train_wsc_789_1760637880

Evaluation results