train_cola_1757340210

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

Loss: 0.1923
Num Input Tokens Seen: 3668312

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2487	0.5	962	0.2167	183008
0.5743	1.0	1924	0.2664	366712
0.1434	1.5	2886	0.2042	550360
0.197	2.0	3848	0.1923	734016
0.2943	2.5	4810	0.2202	917408
0.1381	3.0	5772	0.2203	1100824
0.0434	3.5	6734	0.2658	1283896
0.1102	4.0	7696	0.2310	1467248
0.105	4.5	8658	0.2265	1651280
0.2117	5.0	9620	0.2244	1834568
0.2033	5.5	10582	0.2384	2017960
0.0137	6.0	11544	0.2251	2201464
0.1063	6.5	12506	0.2567	2384536
0.1768	7.0	13468	0.2313	2568040
0.0031	7.5	14430	0.2396	2750664
0.1833	8.0	15392	0.2406	2934360
0.0034	8.5	16354	0.2428	3118424
0.4312	9.0	17316	0.2427	3301448
0.1988	9.5	18278	0.2396	3485512
0.3332	10.0	19240	0.2434	3668312

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.8.0+cu128
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_cola_1757340210

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2404)

this model