train_wsc_42_1763998312

This model is a fine-tuned version of meta-llama/Llama-3.2-1B-Instruct on the wsc dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 4
eval_batch_size: 4
seed: 42
optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 10

Training Loss	Epoch	Step	Validation Loss	Input Tokens Seen
0.378	0.504	63	0.3661	24288
0.3756	1.008	126	0.3533	49584
0.3497	1.512	189	0.3516	74512
0.3604	2.016	252	0.3515	99264
0.3429	2.52	315	0.3537	123360
0.3684	3.024	378	0.3799	149120
0.3434	3.528	441	0.3506	174208
0.339	4.032	504	0.3518	198016
0.3547	4.536	567	0.3559	223296
0.3624	5.04	630	0.3515	247344
0.3447	5.5440	693	0.3524	271856
0.329	6.048	756	0.3479	297472
0.3332	6.552	819	0.3579	322272
0.3211	7.056	882	0.3529	347200
0.344	7.5600	945	0.3525	372576
0.3216	8.064	1008	0.3517	397008
0.3433	8.568	1071	0.3513	421904
0.3399	9.072	1134	0.3479	446720
0.3337	9.576	1197	0.3513	471168

Base model

Adapter

(509)

this model