train_cola_789_1768397605

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

Loss: 0.2024
Num Input Tokens Seen: 3459744

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 789
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.0563	0.5	1924	0.2546	172704
0.3442	1.0	3848	0.2024	345704
0.3025	1.5	5772	0.2383	518760
0.0646	2.0	7696	0.2359	691408
0.0418	2.5	9620	0.2663	865168
0.6154	3.0	11544	0.2419	1037864
0.0014	3.5	13468	0.2601	1210568
0.0008	4.0	15392	0.2400	1383872
0.0015	4.5	17316	0.2624	1557648
0.0007	5.0	19240	0.2528	1729688
0.1866	5.5	21164	0.2502	1902632
0.0013	6.0	23088	0.2731	2075456
0.1663	6.5	25012	0.2795	2248320
0.0026	7.0	26936	0.2680	2421448
0.167	7.5	28860	0.2922	2594888
0.1602	8.0	30784	0.3015	2767560
0.4744	8.5	32708	0.2975	2940312
0.0008	9.0	34632	0.3114	3113600
0.0021	9.5	36556	0.3069	3286752
0.5644	10.0	38480	0.3069	3459744

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.1+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_cola_789_1768397605

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2376)

this model