Pruner_Adaptor_Qwen_3_FINAL_EXTRA

This model is a fine-tuned version of Qwen/Qwen3-0.6B on the web_finetune_train dataset. It achieves the following results on the evaluation set:

Model description

More information needed

More information needed

More information needed

The following hyperparameters were used during training:

learning_rate: 1.2e-05
train_batch_size: 2
eval_batch_size: 2
seed: 42
distributed_type: multi-GPU
num_devices: 2
gradient_accumulation_steps: 8
total_train_batch_size: 32
total_eval_batch_size: 4
optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1.0
mixed_precision_training: Native AMP

Training Loss	Epoch	Step	Validation Loss
0.1234	0.0385	50	0.1188
0.1271	0.0770	100	0.1215
0.1242	0.1155	150	0.1278
0.1262	0.1540	200	0.1296
0.1268	0.1925	250	0.1261
0.106	0.2310	300	0.1267
0.1523	0.2695	350	0.1307
0.1448	0.3080	400	0.1227
0.1547	0.3465	450	0.1247
0.1381	0.3849	500	0.1239
0.1431	0.4234	550	0.1213
0.1173	0.4619	600	0.1187
0.1056	0.5004	650	0.1197
0.0919	0.5389	700	0.1166
0.1154	0.5774	750	0.1194
0.1116	0.6159	800	0.1160
0.1378	0.6544	850	0.1157
0.1122	0.6929	900	0.1154
0.1321	0.7314	950	0.1156
0.0823	0.7699	1000	0.1165
0.1321	0.8084	1050	0.1115
0.1015	0.8469	1100	0.1116
0.1224	0.8854	1150	0.1108
0.1006	0.9239	1200	0.1110
0.1294	0.9624	1250	0.1110

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Base model

Finetuned

Adapter

(279)

this model