train_cola_123_1760637706

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

Loss: 1.2137
Num Input Tokens Seen: 7337920

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
1.7199	1.0	1924	1.3822	367320
1.2701	2.0	3848	1.2342	734600
1.6967	3.0	5772	1.2344	1101216
1.6043	4.0	7696	1.2395	1468552
1.1936	5.0	9620	1.2258	1834816
0.8882	6.0	11544	1.2179	2201584
0.8398	7.0	13468	1.2250	2568288
1.05	8.0	15392	1.2160	2935056
1.462	9.0	17316	1.2254	3301760
1.7389	10.0	19240	1.2256	3669168
0.8938	11.0	21164	1.2306	4036096
1.1366	12.0	23088	1.2173	4403128
1.146	13.0	25012	1.2324	4769264
1.0346	14.0	26936	1.2137	5136352
1.2975	15.0	28860	1.2231	5503048
1.3061	16.0	30784	1.2178	5869824
1.1825	17.0	32708	1.2238	6236752
1.1393	18.0	34632	1.2238	6603776
1.2381	19.0	36556	1.2238	6970736
1.0217	20.0	38480	1.2238	7337920

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_cola_123_1760637706

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model