train_wic_123_1760637692

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

Loss: 0.4619
Num Input Tokens Seen: 8429424

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.319	1.0	1222	0.5033	421528
0.3012	2.0	2444	0.4831	843368
0.6519	3.0	3666	0.4750	1264408
0.3036	4.0	4888	0.4651	1685768
0.4584	5.0	6110	0.4697	2106968
0.4144	6.0	7332	0.4658	2528648
0.5252	7.0	8554	0.4675	2949592
0.3243	8.0	9776	0.4662	3371056
0.4506	9.0	10998	0.4687	3792672
0.5059	10.0	12220	0.4680	4213808
0.3959	11.0	13442	0.4619	4634936
0.4658	12.0	14664	0.4668	5056144
0.2719	13.0	15886	0.4651	5477344
0.387	14.0	17108	0.4630	5898504
0.4383	15.0	18330	0.4668	6320560
0.5965	16.0	19552	0.4665	6741824
0.6455	17.0	20774	0.4636	7163512
0.3475	18.0	21996	0.4631	7585736
0.6931	19.0	23218	0.4675	8007456
0.4062	20.0	24440	0.4681	8429424

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_wic_123_1760637692

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2394)

this model