train_cola_42_1760637591

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

Loss: 0.1542
Num Input Tokens Seen: 7336064

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1344	1.0	1924	0.2083	366856
0.179	2.0	3848	0.1707	734320
0.1562	3.0	5772	0.1647	1100800
0.1315	4.0	7696	0.1599	1467824
0.0992	5.0	9620	0.1571	1834632
0.0625	6.0	11544	0.1609	2202264
0.1842	7.0	13468	0.1577	2568880
0.0837	8.0	15392	0.1569	2935520
0.1961	9.0	17316	0.1564	3302192
0.0907	10.0	19240	0.1542	3668584
0.1339	11.0	21164	0.1542	4034712
0.0926	12.0	23088	0.1545	4401480
0.1873	13.0	25012	0.1549	4768408
0.0718	14.0	26936	0.1564	5135240
0.1902	15.0	28860	0.1549	5501784
0.0978	16.0	30784	0.1547	5868800
0.1516	17.0	32708	0.1552	6235472
0.1028	18.0	34632	0.1552	6601760
0.1279	19.0	36556	0.1551	6968720
0.0436	20.0	38480	0.1549	7336064

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: -

Model tree for rbelanec/train_cola_42_1760637591

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2186)

this model