train_wsc_42_1760637536

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Loss: 0.3549
Num Input Tokens Seen: 985952

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
6.1999	1.0	125	5.0939	49104
0.5007	2.0	250	0.5214	98400
0.4231	3.0	375	0.3933	147712
0.371	4.0	500	0.3615	196320
0.3531	5.0	625	0.4567	245520
0.3797	6.0	750	0.4273	294976
0.3769	7.0	875	0.3809	344320
0.3729	8.0	1000	0.3906	393840
0.3446	9.0	1125	0.3553	443168
0.3702	10.0	1250	0.3585	492304
0.4168	11.0	1375	0.3591	541504
0.3828	12.0	1500	0.3549	590864
0.3636	13.0	1625	0.3585	640656
0.3249	14.0	1750	0.3583	689776
0.3689	15.0	1875	0.3554	739024
0.3544	16.0	2000	0.3621	788480
0.3224	17.0	2125	0.3574	837600
0.3574	18.0	2250	0.3581	887088
0.3582	19.0	2375	0.3589	936768
0.3613	20.0	2500	0.3601	985952

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_wsc_42_1760637536

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2398)

this model