train_wic_456_1760637804

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

Loss: 0.3411
Num Input Tokens Seen: 8434688

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3621	1.0	1222	0.3511	421520
0.3623	2.0	2444	0.3497	843032
0.341	3.0	3666	0.3455	1265032
0.3431	4.0	4888	0.3452	1687192
0.3231	5.0	6110	0.3473	2108776
0.3405	6.0	7332	0.3416	2530232
0.3271	7.0	8554	0.3427	2952296
0.3356	8.0	9776	0.3430	3374128
0.3411	9.0	10998	0.3442	3795712
0.3524	10.0	12220	0.3420	4217816
0.345	11.0	13442	0.3422	4639632
0.3404	12.0	14664	0.3411	5060952
0.3504	13.0	15886	0.3416	5482656
0.3331	14.0	17108	0.3422	5904024
0.3491	15.0	18330	0.3414	6325800
0.33	16.0	19552	0.3423	6747856
0.312	17.0	20774	0.3418	7169800
0.3529	18.0	21996	0.3423	7591280
0.3452	19.0	23218	0.3416	8013240
0.3385	20.0	24440	0.3414	8434688

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_wic_456_1760637804

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2394)

this model