train_cola_456_1760637821

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

Loss: 0.1462
Num Input Tokens Seen: 7334376

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.343	1.0	1924	0.2076	366712
0.1676	2.0	3848	0.1700	734016
0.1305	3.0	5772	0.1598	1100824
0.1038	4.0	7696	0.1545	1467248
0.1421	5.0	9620	0.1540	1834568
0.1661	6.0	11544	0.1500	2201464
0.1069	7.0	13468	0.1486	2568040
0.1344	8.0	15392	0.1488	2934360
0.1965	9.0	17316	0.1477	3301448
0.2095	10.0	19240	0.1473	3668312
0.1826	11.0	21164	0.1466	4034856
0.0752	12.0	23088	0.1464	4401344
0.1447	13.0	25012	0.1476	4767736
0.149	14.0	26936	0.1475	5134344
0.141	15.0	28860	0.1468	5501408
0.1746	16.0	30784	0.1468	5867920
0.1196	17.0	32708	0.1467	6234920
0.109	18.0	34632	0.1462	6601944
0.0517	19.0	36556	0.1470	6968096
0.1629	20.0	38480	0.1465	7334376

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 3

Model tree for rbelanec/train_cola_456_1760637821

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2124)

this model

rbelanec
/

train_cola_456_1760637821

train_cola_456_1760637821

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for rbelanec/train_cola_456_1760637821

Evaluation results