train_cola_1757340212

This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

Loss: 0.1312
Num Input Tokens Seen: 3668312

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 456
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10.0

Training results

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.2618	0.5	962	0.1746	183008
0.1814	1.0	1924	0.1962	366712
0.0641	1.5	2886	0.1630	550360
0.2333	2.0	3848	0.1312	734016
0.1393	2.5	4810	0.1553	917408
0.0033	3.0	5772	0.2327	1100824
0.0011	3.5	6734	0.2591	1283896
0.061	4.0	7696	0.1798	1467248
0.0011	4.5	8658	0.2695	1651280
0.0011	5.0	9620	0.2479	1834568
0.0014	5.5	10582	0.2734	2017960
0.0005	6.0	11544	0.3183	2201464
0.0005	6.5	12506	0.3548	2384536
0.0001	7.0	13468	0.3410	2568040
0.0001	7.5	14430	0.3688	2750664
0.0	8.0	15392	0.4112	2934360
0.0	8.5	16354	0.4241	3118424
0.0	9.0	17316	0.4777	3301448
0.0	9.5	18278	0.4891	3485512
0.0	10.0	19240	0.4903	3668312

Framework versions

PEFT 0.15.2
Transformers 4.51.3
Pytorch 2.8.0+cu128
Datasets 3.6.0
Tokenizers 0.21.1

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rbelanec/train_cola_1757340212

Base model

meta-llama/Meta-Llama-3-8B-Instruct

Adapter

(2406)

this model