train_cola_42_1760637589

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

Loss: 0.1704
Num Input Tokens Seen: 7336064

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1674	1.0	1924	0.1823	366856
0.2198	2.0	3848	0.2092	734320
0.1399	3.0	5772	0.1704	1100800
0.0309	4.0	7696	0.2225	1467824
0.0088	5.0	9620	0.2168	1834632
0.0706	6.0	11544	0.2784	2202264
0.0011	7.0	13468	0.3766	2568880
0.0004	8.0	15392	0.3274	2935520
0.0948	9.0	17316	0.5466	3302192
0.0486	10.0	19240	0.5260	3668584
0.0	11.0	21164	0.4925	4034712
0.0	12.0	23088	0.5612	4401480
0.0	13.0	25012	0.5035	4768408
0.0	14.0	26936	0.5319	5135240
0.0158	15.0	28860	0.5083	5501784
0.0	16.0	30784	0.6302	5868800
0.0	17.0	32708	0.6221	6235472
0.0	18.0	34632	0.6385	6601760
0.0	19.0	36556	0.6424	6968720
0.0	20.0	38480	0.6508	7336064

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_cola_42_1760637589

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model