train_wic_456_1760637806

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

Loss: 0.2461
Num Input Tokens Seen: 8434688

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.29	1.0	1222	0.2461	421520
0.2825	2.0	2444	0.2747	843032
0.1207	3.0	3666	0.2840	1265032
0.0992	4.0	4888	0.3801	1687192
0.0989	5.0	6110	0.3892	2108776
0.0005	6.0	7332	0.5116	2530232
0.0947	7.0	8554	0.7093	2952296
0.0	8.0	9776	0.7158	3374128
0.0	9.0	10998	0.6899	3795712
0.0	10.0	12220	0.9105	4217816
0.0051	11.0	13442	0.7330	4639632
0.0	12.0	14664	0.8850	5060952
0.0	13.0	15886	1.0206	5482656
0.0	14.0	17108	1.0846	5904024
0.0	15.0	18330	1.1273	6325800
0.0	16.0	19552	1.1668	6747856
0.0	17.0	20774	1.1959	7169800
0.0	18.0	21996	1.2175	7591280
0.0	19.0	23218	1.2244	8013240
0.0	20.0	24440	1.2365	8434688

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_wic_456_1760637806

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2397)

this model