Qwen3-4B Structured Output (SFT + DPO)

This model is a full-merged version of Qwen/Qwen3-4B-Instruct-2507 fine-tuned for structured output tasks (JSON / YAML / XML / TOML / CSV) via a two-stage pipeline: SFT (QLoRA) → DPO.

Training Pipeline

Qwen/Qwen3-4B-Instruct-2507  (base)
  |
  v
SFT via QLoRA (4-bit, Unsloth)
  - Dataset: v2 + structured-hard-sft-4k + structured-5k-mix-sft (concatenated)
  - Adapter saved as LoRA weights
  |
  v
Merge LoRA adapter into full 16-bit weights
  |
  v
DPO (Direct Preference Optimization)
  - Dataset: u-10bei/dpo-dataset-qwen-cot
  |
  v
GawinGowin/qwen3-4b-struct-output  (this model)

SFT Configuration

Parameter Value
Base model Qwen/Qwen3-4B-Instruct-2507
Method QLoRA (4-bit) + Unsloth
Datasets u-10bei/structured_data_with_cot_dataset_512_v2, daichira/structured-hard-sft-4k, daichira/structured-5k-mix-sft
Max sequence length 1024
Epochs 2
Learning rate 5e-6
LoRA r / alpha 64 / 128
CoT Masking Enabled (loss applied after Output: marker only)

DPO Configuration

Parameter Value
Method DPO (TRL) + Unsloth
Dataset u-10bei/dpo-dataset-qwen-cot
Epochs 1
Learning rate 1e-7
Beta 0.1
LoRA r / alpha 8 / 16 (merged)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "GawinGowin/qwen3-4b-struct-output"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.float16, device_map="auto",
)

Sources & License

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for GawinGowin/qwen3-4b-struct-output

Finetuned
(1537)
this model

Datasets used to train GawinGowin/qwen3-4b-struct-output