train_qnli_42_1760637630

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the qnli dataset. It achieves the following results on the evaluation set:

Loss: 0.2401
Num Input Tokens Seen: 184219696

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1503	2.0	41898	0.1027	18419408
0.0625	4.0	83796	0.0419	36841168
0.0036	6.0	125694	0.0515	55267408
0.0004	8.0	167592	0.0685	73689808
0.0015	10.0	209490	0.0959	92116576
0.0001	12.0	251388	0.1372	110535312
0.0	14.0	293286	0.1740	128950240
0.0	16.0	335184	0.2036	147370704
0.0	18.0	377082	0.2238	165801024
0.0	20.0	418980	0.2401	184219696

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_qnli_42_1760637630

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2398)

this model