qwen3-4b-structured-sft-dpo

Two-stage fine-tuned model: SFT + DPO

Training

  • Stage 1 (SFT): QLoRA on structured_data_with_cot_dataset_512_v2 (LR=2e-6, Epochs=2, LoRA r=64)
  • Stage 2 (DPO): DPO on dpo-dataset-qwen-cot (LR=1e-07, Epochs=1, Beta=0.1, LoRA r=8)
Downloads last month
20
Safetensors
Model size
4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for ryusuke009/qwen3-4b-structured-sft-dpo

Finetuned
(1054)
this model

Datasets used to train ryusuke009/qwen3-4b-structured-sft-dpo