train_cola_123_1760637703

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

Loss: 0.3014
Num Input Tokens Seen: 7337920

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1875	1.0	1924	0.2801	367320
0.2153	2.0	3848	0.2494	734600
0.2264	3.0	5772	0.2482	1101216
0.2437	4.0	7696	0.2485	1468552
0.2505	5.0	9620	0.2564	1834816
0.2634	6.0	11544	0.2473	2201584
0.241	7.0	13468	0.2489	2568288
0.2443	8.0	15392	0.2457	2935056
0.2382	9.0	17316	0.2410	3301760
0.2569	10.0	19240	0.2436	3669168
0.1061	11.0	21164	0.2083	4036096
0.2432	12.0	23088	0.2028	4403128
0.1099	13.0	25012	0.1972	4769264
0.277	14.0	26936	0.1904	5136352
0.2623	15.0	28860	0.1892	5503048
0.0648	16.0	30784	0.1909	5869824
0.1078	17.0	32708	0.1998	6236752
0.3221	18.0	34632	0.2093	6603776
0.2195	19.0	36556	0.2154	6970736
0.0705	20.0	38480	0.2163	7337920

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 2

Model tree for rbelanec/train_cola_123_1760637703

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model