train_wsc_123_1768397593

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Loss: 0.3695
Num Input Tokens Seen: 437760

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.6166	0.5020	125	0.5235	22304
0.7843	1.0040	250	0.4796	44064
0.3906	1.5060	375	0.4048	65808
0.2747	2.0080	500	0.4026	88048
0.3692	2.5100	625	0.3930	109696
0.3471	3.0120	750	0.3742	131872
0.3897	3.5141	875	0.3695	154416
0.3295	4.0161	1000	0.3711	176048
0.3245	4.5181	1125	0.3906	198432
0.3318	5.0201	1250	0.3944	219680
0.365	5.5221	1375	0.3919	241136
0.365	6.0241	1500	0.3956	263616
0.3629	6.5261	1625	0.4064	285424
0.245	7.0281	1750	0.4294	307792
0.2859	7.5301	1875	0.4473	329840
0.3097	8.0321	2000	0.4224	351552
0.2745	8.5341	2125	0.4234	373424
0.3204	9.0361	2250	0.4284	395616
0.3417	9.5382	2375	0.4270	417520

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.1+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_wsc_123_1768397593

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2399)

this model