Extractor_Adaptor_Qwen3_QA_websrc_final

This model is a fine-tuned version of Qwen/Qwen3-0.6B on the web_finetune_train dataset. It achieves the following results on the evaluation set:

  • Loss: 0.1143

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1.3e-05
  • train_batch_size: 14
  • eval_batch_size: 16
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 56
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1.0

Training results

Training Loss Epoch Step Validation Loss
0.2497 0.0745 100 0.2823
0.0999 0.1490 200 0.1585
0.047 0.2235 300 0.1330
0.0317 0.2981 400 0.1230
0.0303 0.3726 500 0.1318
0.0321 0.4471 600 0.1310
0.028 0.5216 700 0.1143
0.0308 0.5961 800 0.1453
0.0266 0.6706 900 0.1385
0.0202 0.7452 1000 0.1351
0.028 0.8197 1100 0.1324
0.0234 0.8942 1200 0.1401
0.0268 0.9687 1300 0.1400

Framework versions

  • PEFT 0.17.1
  • Transformers 4.57.1
  • Pytorch 2.4.1+cu124
  • Datasets 4.0.0
  • Tokenizers 0.22.1
Downloads last month
1
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for abdo-Mansour/Extractor_Adaptor_Qwen3_QA_websrc_final

Finetuned
Qwen/Qwen3-0.6B
Adapter
(252)
this model