train_wic_1754652155

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

Loss: 0.3459
Num Input Tokens Seen: 4213808

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
1.9907	0.5	611	1.6218	210240
0.3602	1.0	1222	0.3927	421528
0.3732	1.5	1833	0.3616	632632
0.339	2.0	2444	0.3660	843368
0.3131	2.5	3055	0.3525	1054024
0.2979	3.0	3666	0.3668	1264408
0.3421	3.5	4277	0.3686	1475000
0.329	4.0	4888	0.3755	1685768
0.3469	4.5	5499	0.3563	1895752
0.405	5.0	6110	0.3546	2106968
0.3423	5.5	6721	0.3459	2318136
0.3369	6.0	7332	0.3487	2528648
0.3368	6.5	7943	0.3510	2739720
0.3344	7.0	8554	0.3467	2949592
0.3402	7.5	9165	0.3471	3160056
0.3079	8.0	9776	0.3470	3371056
0.3528	8.5	10387	0.3470	3581616
0.3432	9.0	10998	0.3475	3792672
0.3226	9.5	11609	0.3480	4003136
0.3417	10.0	12220	0.3483	4213808

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.8.0+cu128
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wic_1754652155

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2404)

this model