train_wic_789_1760637921

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

Loss: 0.4803
Num Input Tokens Seen: 8431032

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.4524	1.0	1222	0.5212	421768
0.7725	2.0	2444	0.4988	843296
0.5933	3.0	3666	0.4898	1265072
0.7239	4.0	4888	0.4816	1687136
0.3341	5.0	6110	0.4887	2108680
0.7349	6.0	7332	0.4809	2530168
0.3534	7.0	8554	0.4803	2951208
0.5553	8.0	9776	0.4807	3372504
0.4469	9.0	10998	0.4881	3793768
0.3666	10.0	12220	0.4859	4214928
0.5461	11.0	13442	0.4833	4636520
0.3518	12.0	14664	0.4861	5057560
0.5114	13.0	15886	0.4805	5479248
0.3864	14.0	17108	0.4860	5901056
0.5046	15.0	18330	0.4898	6323016
0.4585	16.0	19552	0.4831	6744792
0.4453	17.0	20774	0.4874	7165960
0.5392	18.0	21996	0.4807	7587872
0.5631	19.0	23218	0.4824	8009040
0.383	20.0	24440	0.4824	8431032

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for rbelanec/train_wic_789_1760637921

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2403)

this model