train_cola_789_1760637932

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

Loss: 0.2571
Num Input Tokens Seen: 7327648

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 4
eval_batch_size: 4
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2779	1.0	1924	0.2667	365728
0.3874	2.0	3848	0.2572	731984
0.364	3.0	5772	0.2676	1098920
0.1556	4.0	7696	0.2792	1465464
0.2329	5.0	9620	0.2564	1831920
0.2149	6.0	11544	0.2562	2198176
0.2522	7.0	13468	0.2556	2564952
0.3015	8.0	15392	0.2574	2931096
0.231	9.0	17316	0.2585	3296808
0.2284	10.0	19240	0.2553	3663512
0.2682	11.0	21164	0.2575	4029608
0.2689	12.0	23088	0.2563	4395616
0.2326	13.0	25012	0.2546	4762456
0.2378	14.0	26936	0.2547	5128712
0.2312	15.0	28860	0.2517	5495008
0.2601	16.0	30784	0.2483	5861104
0.2167	17.0	32708	0.2445	6228320
0.25	18.0	34632	0.2414	6595032
0.1732	19.0	36556	0.2407	6961416
0.2559	20.0	38480	0.2405	7327648

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_cola_789_1760637932

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model