Pruner_Adaptor_Qwen_3_FINAL_EXTRA

This model is a fine-tuned version of Qwen/Qwen3-0.6B on the web_finetune_train dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1108

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1.2e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 4
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.1234 0.0385 50 0.1188
0.1271 0.0770 100 0.1215
0.1242 0.1155 150 0.1278
0.1262 0.1540 200 0.1296
0.1268 0.1925 250 0.1261
0.106 0.2310 300 0.1267
0.1523 0.2695 350 0.1307
0.1448 0.3080 400 0.1227
0.1547 0.3465 450 0.1247
0.1381 0.3849 500 0.1239
0.1431 0.4234 550 0.1213
0.1173 0.4619 600 0.1187
0.1056 0.5004 650 0.1197
0.0919 0.5389 700 0.1166
0.1154 0.5774 750 0.1194
0.1116 0.6159 800 0.1160
0.1378 0.6544 850 0.1157
0.1122 0.6929 900 0.1154
0.1321 0.7314 950 0.1156
0.0823 0.7699 1000 0.1165
0.1321 0.8084 1050 0.1115
0.1015 0.8469 1100 0.1116
0.1224 0.8854 1150 0.1108
0.1006 0.9239 1200 0.1110
0.1294 0.9624 1250 0.1110

Framework versions

  • PEFT 0.15.2
  • Transformers 4.57.1
  • Pytorch 2.9.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.22.1
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for abdo-Mansour/Pruner_Adaptor_Qwen_3_FINAL_EXTRA

Finetuned
Qwen/Qwen3-0.6B
Adapter
(279)
this model