train_wic_789_1760637918

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

Loss: 0.3416
Num Input Tokens Seen: 8431032

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3708	1.0	1222	0.3733	421768
0.3019	2.0	2444	0.3525	843296
0.3389	3.0	3666	0.3579	1265072
0.3524	4.0	4888	0.3498	1687136
0.3418	5.0	6110	0.3608	2108680
0.3994	6.0	7332	0.3656	2530168
0.3609	7.0	8554	0.3498	2951208
0.3591	8.0	9776	0.3432	3372504
0.3286	9.0	10998	0.3435	3793768
0.3725	10.0	12220	0.3433	4214928
0.366	11.0	13442	0.3461	4636520
0.3456	12.0	14664	0.3423	5057560
0.3513	13.0	15886	0.3416	5479248
0.3413	14.0	17108	0.3492	5901056
0.3378	15.0	18330	0.3466	6323016
0.3189	16.0	19552	0.3435	6744792
0.3355	17.0	20774	0.3431	7165960
0.3558	18.0	21996	0.3427	7587872
0.3131	19.0	23218	0.3424	8009040
0.3187	20.0	24440	0.3444	8431032

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 4

Model tree for rbelanec/train_wic_789_1760637918

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2117)

this model

rbelanec
/

train_wic_789_1760637918

train_wic_789_1760637918

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for rbelanec/train_wic_789_1760637918

Evaluation results