train_wic_42_1760637578

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

Loss: 0.2366
Num Input Tokens Seen: 8436720

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3007	1.0	1222	0.3212	421584
0.3622	2.0	2444	0.2898	843680
0.2031	3.0	3666	0.2680	1265424
0.3126	4.0	4888	0.2567	1687040
0.2698	5.0	6110	0.2502	2108784
0.2302	6.0	7332	0.2470	2530720
0.1633	7.0	8554	0.2408	2952752
0.2736	8.0	9776	0.2422	3374632
0.2144	9.0	10998	0.2405	3795800
0.2546	10.0	12220	0.2373	4217096
0.476	11.0	13442	0.2396	4639384
0.225	12.0	14664	0.2366	5061608
0.2875	13.0	15886	0.2392	5483544
0.2158	14.0	17108	0.2378	5905728
0.1944	15.0	18330	0.2414	6327600
0.1244	16.0	19552	0.2386	6749312
0.2328	17.0	20774	0.2396	7171032
0.3336	18.0	21996	0.2403	7592384
0.1747	19.0	23218	0.2391	8014496
0.2127	20.0	24440	0.2388	8436720

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_wic_42_1760637578

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2399)

this model