train_wic_123_1760637690

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

Loss: 0.4493
Num Input Tokens Seen: 8429424

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3466	1.0	1222	0.3523	421528
0.3339	2.0	2444	0.3515	843368
0.3207	3.0	3666	0.3690	1264408
0.3038	4.0	4888	0.3402	1685768
0.4083	5.0	6110	0.3275	2106968
0.3375	6.0	7332	0.3203	2528648
0.3314	7.0	8554	0.3162	2949592
0.2827	8.0	9776	0.3269	3371056
0.335	9.0	10998	0.3128	3792672
0.2672	10.0	12220	0.3146	4213808
0.3483	11.0	13442	0.3157	4634936
0.261	12.0	14664	0.3269	5056144
0.3125	13.0	15886	0.3427	5477344
0.2514	14.0	17108	0.3292	5898504
0.2833	15.0	18330	0.3314	6320560
0.0677	16.0	19552	0.3923	6741824
0.0826	17.0	20774	0.4226	7163512
0.1261	18.0	21996	0.4424	7585736
0.0771	19.0	23218	0.4665	8007456
0.1163	20.0	24440	0.4723	8429424

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for rbelanec/train_wic_123_1760637690

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2403)

this model