Pruner_Adaptor_Qwen_3_FINAL

This model is a fine-tuned version of Qwen/Qwen3-0.6B on the web_finetune_train dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1414

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1.7e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 32
  • total_eval_batch_size: 4
  • optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 2.0
  • mixed_precision_training: Native AMP

Training results

Training Loss Epoch Step Validation Loss
0.3343 0.0669 50 0.3391
0.2475 0.1338 100 0.2643
0.2304 0.2008 150 0.2300
0.2334 0.2677 200 0.2076
0.2074 0.3346 250 0.2157
0.1936 0.4015 300 0.1944
0.1821 0.4685 350 0.1952
0.188 0.5354 400 0.1767
0.1468 0.6023 450 0.1758
0.1687 0.6692 500 0.1784
0.1516 0.7362 550 0.1691
0.1836 0.8031 600 0.1628
0.1488 0.8700 650 0.1566
0.1698 0.9369 700 0.1554
0.1213 1.0027 750 0.1608
0.1281 1.0696 800 0.1592
0.1214 1.1365 850 0.1500
0.0991 1.2034 900 0.1510
0.1201 1.2704 950 0.1536
0.1103 1.3373 1000 0.1543
0.1147 1.4042 1050 0.1527
0.1154 1.4711 1100 0.1484
0.1174 1.5381 1150 0.1447
0.0903 1.6050 1200 0.1432
0.0914 1.6719 1250 0.1427
0.0913 1.7388 1300 0.1414
0.0889 1.8058 1350 0.1415
0.1036 1.8727 1400 0.1417
0.0915 1.9396 1450 0.1421

Framework versions

  • PEFT 0.15.2
  • Transformers 4.57.1
  • Pytorch 2.9.0+cu128
  • Datasets 3.6.0
  • Tokenizers 0.22.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for abdo-Mansour/Pruner_Adaptor_Qwen_3_FINAL

Finetuned
Qwen/Qwen3-0.6B
Adapter
(250)
this model