train_cola_42_1760637586

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

Loss: 0.5116
Num Input Tokens Seen: 6524480

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2121	2.0	3420	0.2554	652448
0.2753	4.0	6840	0.2554	1304768
0.267	6.0	10260	0.2569	1956608
0.2764	8.0	13680	0.2677	2608576
0.1592	10.0	17100	0.2819	3261088
0.1569	12.0	20520	0.3185	3914304
0.1294	14.0	23940	0.3592	4566720
0.0758	16.0	27360	0.4484	5220064
0.0682	18.0	30780	0.4960	5873024
0.124	20.0	34200	0.5116	6524480

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_cola_42_1760637586

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model