train_cola_101112_1760638048

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

Loss: 0.1406
Num Input Tokens Seen: 7325256

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 101112
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 20

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.362	1.0	1924	0.1874	366136
0.1343	2.0	3848	0.1626	732880
0.0912	3.0	5772	0.1570	1099816
0.0998	4.0	7696	0.1502	1465464
0.152	5.0	9620	0.1472	1831728
0.1648	6.0	11544	0.1446	2198176
0.1692	7.0	13468	0.1489	2564208
0.1091	8.0	15392	0.1440	2930240
0.1358	9.0	17316	0.1425	3297136
0.0906	10.0	19240	0.1429	3663392
0.1369	11.0	21164	0.1406	4028760
0.1161	12.0	23088	0.1461	4394320
0.1274	13.0	25012	0.1438	4761000
0.1108	14.0	26936	0.1421	5127440
0.1319	15.0	28860	0.1421	5494368
0.1346	16.0	30784	0.1416	5860888
0.1997	17.0	32708	0.1416	6226952
0.0858	18.0	34632	0.1416	6593400
0.082	19.0	36556	0.1418	6959600
0.1285	20.0	38480	0.1421	7325256

Framework versions

PEFT 0.17.1
Transformers 4.51.3
Pytorch 2.9.0+cu128
Datasets 4.0.0
Tokenizers 0.21.4

Downloads last month: 1

Model tree for rbelanec/train_cola_101112_1760638048

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2398)

this model