train_cola_42_1763998305

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the cola dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.474	0.5	1924	0.4129	173168
0.1377	1.0	3848	0.2851	346040
0.3359	1.5	5772	0.2766	518984
0.0066	2.0	7696	0.3507	692368
0.0329	2.5	9620	0.2888	866112
0.1656	3.0	11544	0.2561	1039080
0.0726	3.5	13468	0.2908	1212120
0.3968	4.0	15392	0.2934	1385192
0.5147	4.5	17316	0.3050	1558248
0.223	5.0	19240	0.2613	1731824
0.2536	5.5	21164	0.3119	1904960
0.5499	6.0	23088	0.3149	2078408
0.3069	6.5	25012	0.2976	2251848
0.3753	7.0	26936	0.2983	2424592
0.2871	7.5	28860	0.3098	2597104
0.3502	8.0	30784	0.3049	2770768
0.4523	8.5	32708	0.3036	2944224
0.2483	9.0	34632	0.3079	3117120
0.0411	9.5	36556	0.3086	3290224
0.0015	10.0	38480	0.3065	3463336

Base model

Adapter

(600)

this model