train_wsc_42_1763998310

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.848	0.504	63	1.0210	24288
0.4212	1.008	126	0.4317	49584
0.3622	1.512	189	0.3481	74512
0.3622	2.016	252	0.3425	99264
0.3501	2.52	315	0.3444	123360
0.3532	3.024	378	0.3470	149120
0.3432	3.528	441	0.3454	174208
0.3477	4.032	504	0.3446	198016
0.3444	4.536	567	0.3438	223296
0.3599	5.04	630	0.3446	247344
0.3488	5.5440	693	0.3444	271856
0.3399	6.048	756	0.3411	297472
0.3459	6.552	819	0.3462	322272
0.3201	7.056	882	0.3448	347200
0.3456	7.5600	945	0.3445	372576
0.3223	8.064	1008	0.3447	397008
0.3508	8.568	1071	0.3458	421904
0.3284	9.072	1134	0.3412	446720
0.3345	9.576	1197	0.3446	471168

Base model

Adapter

(508)

this model