train_wic_42_1760637577

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

Loss: 0.4767
Num Input Tokens Seen: 8436720

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.5077	1.0	1222	0.5073	421584
0.6103	2.0	2444	0.4966	843680
0.347	3.0	3666	0.4791	1265424
0.4852	4.0	4888	0.4831	1687040
0.5648	5.0	6110	0.4832	2108784
0.4625	6.0	7332	0.4819	2530720
0.3368	7.0	8554	0.4861	2952752
0.8262	8.0	9776	0.4816	3374632
0.3016	9.0	10998	0.4842	3795800
0.3625	10.0	12220	0.4823	4217096
0.5306	11.0	13442	0.4826	4639384
0.5803	12.0	14664	0.4796	5061608
0.4927	13.0	15886	0.4767	5483544
0.5782	14.0	17108	0.4826	5905728
0.4342	15.0	18330	0.4843	6327600
0.4527	16.0	19552	0.4849	6749312
0.3874	17.0	20774	0.4866	7171032
0.6285	18.0	21996	0.4860	7592384
0.484	19.0	23218	0.4860	8014496
0.5019	20.0	24440	0.4860	8436720

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for rbelanec/train_wic_42_1760637577

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2401)

this model