train_cola_101112_1760638042

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

Loss: 0.5172
Num Input Tokens Seen: 6514784

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 4
seed: 101112
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.279	2.0	3420	0.2597	651456
0.2432	4.0	6840	0.2594	1302752
0.4031	6.0	10260	0.2784	1954240
0.3046	8.0	13680	0.2787	2604736
0.3199	10.0	17100	0.2887	3255968
0.1332	12.0	20520	0.3248	3908768
0.2753	14.0	23940	0.4009	4559712
0.0716	16.0	27360	0.4623	5210880
0.1233	18.0	30780	0.5037	5862368
0.0227	20.0	34200	0.5172	6514784

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 2

Model tree for rbelanec/train_cola_101112_1760638042

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2397)

this model