train_cola_123_1768397590

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

Loss: 0.2015
Num Input Tokens Seen: 3465288

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1648	0.5	1924	0.2437	173872
0.2413	1.0	3848	0.2133	346872
0.266	1.5	5772	0.2779	520296
0.3389	2.0	7696	0.2015	693752
0.1653	2.5	9620	0.2242	867416
0.1735	3.0	11544	0.2359	1040128
0.2647	3.5	13468	0.2158	1212976
0.2417	4.0	15392	0.2485	1386696
0.1935	4.5	17316	0.2526	1559896
0.0029	5.0	19240	0.2694	1733072
0.0014	5.5	21164	0.2814	1906160
0.0025	6.0	23088	0.2646	2079640
0.2451	6.5	25012	0.2899	2253000
0.4219	7.0	26936	0.2727	2425920
0.6786	7.5	28860	0.2955	2598960
0.0008	8.0	30784	0.2935	2772144
0.0016	8.5	32708	0.2971	2944864
0.6076	9.0	34632	0.3067	3118472
0.0008	9.5	36556	0.3064	3291720
0.2248	10.0	38480	0.3061	3465288

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.1+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 2

Model tree for rbelanec/train_cola_123_1768397590

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2394)

this model