train_wic_456_1760637808

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the wic dataset. It achieves the following results on the evaluation set:

Loss: 0.2487
Num Input Tokens Seen: 8434688

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.3073	1.0	1222	0.3309	421520
0.2569	2.0	2444	0.3059	843032
0.2599	3.0	3666	0.2784	1265032
0.1718	4.0	4888	0.2665	1687192
0.2312	5.0	6110	0.2587	2108776
0.2408	6.0	7332	0.2591	2530232
0.3833	7.0	8554	0.2546	2952296
0.1846	8.0	9776	0.2499	3374128
0.3673	9.0	10998	0.2543	3795712
0.2573	10.0	12220	0.2487	4217816
0.2382	11.0	13442	0.2525	4639632
0.1639	12.0	14664	0.2488	5060952
0.2474	13.0	15886	0.2513	5482656
0.2604	14.0	17108	0.2523	5904024
0.1286	15.0	18330	0.2506	6325800
0.1977	16.0	19552	0.2505	6747856
0.2337	17.0	20774	0.2516	7169800
0.222	18.0	21996	0.2529	7591280
0.1566	19.0	23218	0.2516	8013240
0.2596	20.0	24440	0.2516	8434688

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_wic_456_1760637808

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2124)

this model

rbelanec
/

train_wic_456_1760637808

train_wic_456_1760637808

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for rbelanec/train_wic_456_1760637808

Evaluation results