train_wsc_456_1760637766

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Loss: 0.3264
Num Input Tokens Seen: 970208

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3872	1.0	125	0.4011	48240
0.3646	2.0	250	0.3264	96896
0.3326	3.0	375	0.3331	145184
0.3654	4.0	500	0.3474	194384
0.342	5.0	625	0.3395	242624
0.3459	6.0	750	0.3464	291216
0.3578	7.0	875	0.3458	339568
0.3548	8.0	1000	0.3402	388576
0.3445	9.0	1125	0.3481	436656
0.346	10.0	1250	0.3475	485152
0.3577	11.0	1375	0.3428	533200
0.3497	12.0	1500	0.3442	581792
0.3477	13.0	1625	0.3456	630384
0.3373	14.0	1750	0.3405	678480
0.3576	15.0	1875	0.3422	727056
0.3545	16.0	2000	0.3420	775168
0.3494	17.0	2125	0.3441	824240
0.3497	18.0	2250	0.3402	872896
0.3463	19.0	2375	0.3430	921296
0.3404	20.0	2500	0.3441	970208

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 5

Model tree for rbelanec/train_wsc_456_1760637766

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2105)

this model

rbelanec
/

train_wsc_456_1760637766

train_wsc_456_1760637766

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for rbelanec/train_wsc_456_1760637766

Evaluation results