train_cola_123_1760637701

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

Loss: 0.4581
Num Input Tokens Seen: 6529824

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 1e-05
train_batch_size: 4
eval_batch_size: 4
seed: 123
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.234	2.0	3420	0.2600	652256
0.2307	4.0	6840	0.2553	1305952
0.2532	6.0	10260	0.2660	1959072
0.1979	8.0	13680	0.2601	2612576
0.1931	10.0	17100	0.2997	3265472
0.2323	12.0	20520	0.3114	3918784
0.2111	14.0	23940	0.3606	4570784
0.0853	16.0	27360	0.4002	5223456
0.2509	18.0	30780	0.4484	5876864
0.1481	20.0	34200	0.4581	6529824

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 3

Model tree for rbelanec/train_cola_123_1760637701

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2392)

this model