qwen_rag_sft

This model is a fine-tuned version of Qwen/Qwen3-0.6B-Base on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.6532

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 2
  • eval_batch_size: 2
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 8
  • optimizer: Use OptimizerNames.ADAMW_TORCH with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 100
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss
0.7189 0.0287 200 0.7363
0.6892 0.0574 400 0.7126
0.6684 0.0861 600 0.6995
0.6602 0.1148 800 0.6907
0.6835 0.1435 1000 0.6842
0.6806 0.1723 1200 0.6790
0.6515 0.2010 1400 0.6751
0.6698 0.2297 1600 0.6716
0.6389 0.2584 1800 0.6690
0.6508 0.2871 2000 0.6670
0.6226 0.3158 2200 0.6651
0.6541 0.3445 2400 0.6631
0.6413 0.3732 2600 0.6616
0.6344 0.4019 2800 0.6605
0.6427 0.4306 3000 0.6593
0.6401 0.4593 3200 0.6584
0.6378 0.4880 3400 0.6574
0.6747 0.5168 3600 0.6567
0.6145 0.5455 3800 0.6562
0.6439 0.5742 4000 0.6556
0.6516 0.6029 4200 0.6552
0.6607 0.6316 4400 0.6548
0.6227 0.6603 4600 0.6543
0.6368 0.6890 4800 0.6541
0.6656 0.7177 5000 0.6539
0.631 0.7464 5200 0.6537
0.639 0.7751 5400 0.6536
0.6428 0.8038 5600 0.6535
0.6292 0.8326 5800 0.6534
0.6412 0.8613 6000 0.6533
0.6408 0.8900 6200 0.6533
0.6631 0.9187 6400 0.6532
0.6471 0.9474 6600 0.6531
0.6514 0.9761 6800 0.6532

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.5.1+cu124
  • Datasets 3.2.0
  • Tokenizers 0.21.0
Downloads last month
1
Safetensors
Model size
0.6B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sucharush/qwen_rag_sft

Finetuned
(520)
this model