train_wic_456_1760637805

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

Loss: 0.3739
Num Input Tokens Seen: 8434688

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3101	1.0	1222	0.3184	421520
0.2245	2.0	2444	0.2564	843032
0.2296	3.0	3666	0.2356	1265032
0.1747	4.0	4888	0.2350	1687192
0.1759	5.0	6110	0.2247	2108776
0.1888	6.0	7332	0.2343	2530232
0.304	7.0	8554	0.2261	2952296
0.1299	8.0	9776	0.2323	3374128
0.1748	9.0	10998	0.2457	3795712
0.1812	10.0	12220	0.2425	4217816
0.221	11.0	13442	0.3058	4639632
0.0642	12.0	14664	0.3305	5060952
0.1347	13.0	15886	0.3712	5482656
0.1404	14.0	17108	0.4667	5904024
0.0056	15.0	18330	0.6260	6325800
0.0004	16.0	19552	0.7366	6747856
0.0005	17.0	20774	0.8914	7169800
0.0006	18.0	21996	0.9300	7591280
0.0004	19.0	23218	0.9629	8013240
0.0004	20.0	24440	0.9570	8434688

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_wic_456_1760637805

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2397)

this model