train_wsc_42_1760637537

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Loss: 0.3487
Num Input Tokens Seen: 985952

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.4341	1.0	125	0.5309	49104
0.3662	2.0	250	0.3506	98400
0.3839	3.0	375	0.3984	147712
0.3465	4.0	500	0.3456	196320
0.3797	5.0	625	0.3477	245520
0.3398	6.0	750	0.3491	294976
0.3475	7.0	875	0.3508	344320
0.3478	8.0	1000	0.3499	393840
0.3413	9.0	1125	0.3543	443168
0.359	10.0	1250	0.3474	492304
0.3569	11.0	1375	0.3524	541504
0.3462	12.0	1500	0.3518	590864
0.3429	13.0	1625	0.3499	640656
0.3358	14.0	1750	0.3460	689776
0.3672	15.0	1875	0.3513	739024
0.3471	16.0	2000	0.3508	788480
0.3369	17.0	2125	0.3467	837600
0.3458	18.0	2250	0.3458	887088
0.3577	19.0	2375	0.3504	936768
0.3583	20.0	2500	0.3469	985952

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_wsc_42_1760637537

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2394)

this model