train_cola_42_1760637590

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

Loss: 1.2020
Num Input Tokens Seen: 7336064

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
1.3068	1.0	1924	1.3393	366856
1.1491	2.0	3848	1.2248	734320
1.1945	3.0	5772	1.2107	1100800
1.2357	4.0	7696	1.2025	1467824
1.2359	5.0	9620	1.2086	1834632
0.9361	6.0	11544	1.2031	2202264
1.2359	7.0	13468	1.2184	2568880
0.8426	8.0	15392	1.2020	2935520
1.0213	9.0	17316	1.2108	3302192
1.1058	10.0	19240	1.2071	3668584
1.5195	11.0	21164	1.2050	4034712
1.0126	12.0	23088	1.2131	4401480
1.2338	13.0	25012	1.2051	4768408
0.8327	14.0	26936	1.2172	5135240
1.5627	15.0	28860	1.2172	5501784
1.2012	16.0	30784	1.2172	5868800
1.1182	17.0	32708	1.2172	6235472
1.2807	18.0	34632	1.2172	6601760
0.9395	19.0	36556	1.2172	6968720
0.8826	20.0	38480	1.2172	7336064

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_cola_42_1760637590

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2124)

this model

rbelanec
/

train_cola_42_1760637590

train_cola_42_1760637590

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for rbelanec/train_cola_42_1760637590

Evaluation results