train_cola_42_1760637587

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

Loss: 0.2506
Num Input Tokens Seen: 7336064

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.03
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2732	1.0	1924	0.2556	366856
0.2665	2.0	3848	0.2570	734320
0.2461	3.0	5772	0.2515	1100800
0.2567	4.0	7696	0.2514	1467824
0.2673	5.0	9620	0.2521	1834632
0.1521	6.0	11544	0.2539	2202264
0.2739	7.0	13468	0.2522	2568880
0.2723	8.0	15392	0.2563	2935520
0.2473	9.0	17316	0.2546	3302192
0.1929	10.0	19240	0.2513	3668584
0.2122	11.0	21164	0.2506	4034712
0.3099	12.0	23088	0.2512	4401480
0.279	13.0	25012	0.2548	4768408
0.2718	14.0	26936	0.2515	5135240
0.2592	15.0	28860	0.2508	5501784
0.2047	16.0	30784	0.2515	5868800
0.3073	17.0	32708	0.2524	6235472
0.2644	18.0	34632	0.2520	6601760
0.3103	19.0	36556	0.2518	6968720
0.2373	20.0	38480	0.2518	7336064

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_cola_42_1760637587

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2393)

this model