train_wsc_456_1760637770

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Loss: 0.3414
Num Input Tokens Seen: 970208

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.5279	1.0	125	0.6950	48240
0.44	2.0	250	0.3622	96896
0.2948	3.0	375	0.3494	145184
0.3459	4.0	500	0.3627	194384
0.3724	5.0	625	0.3450	242624
0.3335	6.0	750	0.3508	291216
0.3562	7.0	875	0.3521	339568
0.3763	8.0	1000	0.3427	388576
0.3588	9.0	1125	0.3517	436656
0.3536	10.0	1250	0.3473	485152
0.3628	11.0	1375	0.3417	533200
0.3447	12.0	1500	0.3462	581792
0.3736	13.0	1625	0.3416	630384
0.3402	14.0	1750	0.3414	678480
0.3445	15.0	1875	0.3427	727056
0.3719	16.0	2000	0.3429	775168
0.3657	17.0	2125	0.3461	824240
0.3416	18.0	2250	0.3435	872896
0.3578	19.0	2375	0.3431	921296
0.3453	20.0	2500	0.3439	970208

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_wsc_456_1760637770

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2105)

this model

rbelanec
/

train_wsc_456_1760637770

train_wsc_456_1760637770

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for rbelanec/train_wsc_456_1760637770

Evaluation results