train_wic_42_1767887009

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

Loss: 0.3170
Num Input Tokens Seen: 4067384

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.6066	0.5002	1222	0.3170	203312
0.096	1.0004	2444	0.3510	406960
0.6999	1.5006	3666	0.3758	610320
0.0235	2.0008	4888	0.3447	814208
0.2523	2.5010	6110	0.4392	1017424
0.4419	3.0012	7332	0.3659	1221232
0.2423	3.5014	8554	0.3769	1424736
0.1568	4.0016	9776	0.3933	1628304
0.411	4.5018	10998	0.3961	1831936
0.6195	5.0020	12220	0.3968	2035280
0.27	5.5023	13442	0.3906	2239168
0.1055	6.0025	14664	0.4025	2442032
0.2196	6.5027	15886	0.4531	2645696
0.4288	7.0029	17108	0.4529	2848960
0.0026	7.5031	18330	0.4631	3052368
0.388	8.0033	19552	0.4850	3255736
0.2477	8.5035	20774	0.4756	3459704
0.1481	9.0037	21996	0.4933	3662296
0.3577	9.5039	23218	0.4952	3866024

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.1+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_wic_42_1767887009

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2399)

this model