train_cola_42_1760637588

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

Loss: 0.2359
Num Input Tokens Seen: 7336064

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.1875	1.0	1924	0.1632	366856
0.1215	2.0	3848	0.1599	734320
0.1298	3.0	5772	0.1463	1100800
0.1182	4.0	7696	0.1624	1467824
0.0837	5.0	9620	0.1449	1834632
0.0851	6.0	11544	0.1475	2202264
0.1248	7.0	13468	0.1509	2568880
0.0537	8.0	15392	0.1524	2935520
0.1729	9.0	17316	0.1479	3302192
0.0321	10.0	19240	0.1585	3668584
0.0463	11.0	21164	0.1689	4034712
0.0419	12.0	23088	0.1786	4401480
0.0523	13.0	25012	0.2057	4768408
0.0038	14.0	26936	0.2443	5135240
0.1481	15.0	28860	0.2262	5501784
0.0023	16.0	30784	0.2471	5868800
0.0065	17.0	32708	0.2818	6235472
0.0062	18.0	34632	0.2778	6601760
0.0028	19.0	36556	0.2833	6968720
0.0025	20.0	38480	0.2888	7336064

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 2

Model tree for rbelanec/train_cola_42_1760637588

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2105)

this model

rbelanec
/

train_cola_42_1760637588

train_cola_42_1760637588

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for rbelanec/train_cola_42_1760637588

Evaluation results