train_wic_1754652154

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

Loss: 0.2713
Num Input Tokens Seen: 4213808

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.371	0.5	611	0.2969	210240
0.3224	1.0	1222	0.2940	421528
0.2289	1.5	1833	0.2676	632632
0.153	2.0	2444	0.2675	843368
0.2494	2.5	3055	0.2532	1054024
0.227	3.0	3666	0.2490	1264408
0.2591	3.5	4277	0.2664	1475000
0.1779	4.0	4888	0.2493	1685768
0.1299	4.5	5499	0.2297	1895752
0.3859	5.0	6110	0.2657	2106968
0.2077	5.5	6721	0.2430	2318136
0.1509	6.0	7332	0.2500	2528648
0.3092	6.5	7943	0.3005	2739720
0.2423	7.0	8554	0.2592	2949592
0.2607	7.5	9165	0.2971	3160056
0.1284	8.0	9776	0.2966	3371056
0.0867	8.5	10387	0.3230	3581616
0.0984	9.0	10998	0.3299	3792672
0.198	9.5	11609	0.3371	4003136
0.0539	10.0	12220	0.3325	4213808

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.8.0+cu128
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_wic_1754652154

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2404)

this model