train_cola_456_1768397597

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

Loss: 0.1872
Num Input Tokens Seen: 3463936

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.4604	0.5	1924	0.2828	173216
0.2691	1.0	3848	0.2345	346216
0.1579	1.5	5772	0.2443	519400
0.251	2.0	7696	0.1872	692896
0.1917	2.5	9620	0.2141	865952
0.0034	3.0	11544	0.2258	1039432
0.0041	3.5	13468	0.2401	1212696
0.2462	4.0	15392	0.2494	1385744
0.6043	4.5	17316	0.2615	1559200
0.002	5.0	19240	0.2415	1732008
0.1836	5.5	21164	0.2970	1905112
0.0029	6.0	23088	0.2535	2078472
0.0014	6.5	25012	0.2912	2251864
0.2213	7.0	26936	0.2719	2425080
0.0073	7.5	28860	0.2845	2597816
0.2082	8.0	30784	0.2876	2771400
0.0016	8.5	32708	0.2916	2945048
0.0077	9.0	34632	0.2925	3117888
0.0014	9.5	36556	0.2975	3291632
0.0022	10.0	38480	0.2972	3463936

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.1+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 3

Model tree for rbelanec/train_cola_456_1768397597

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2376)

this model