Qwen3-4B Structured Output (SFT + DPO)

This model is a full-merged version of Qwen/Qwen3-4B-Instruct-2507 fine-tuned for structured output tasks (JSON / YAML / XML / TOML / CSV) via a two-stage pipeline: SFT (QLoRA) → DPO.

Training Pipeline

Qwen/Qwen3-4B-Instruct-2507  (base)
  |
  v
SFT via QLoRA (4-bit, Unsloth)
  - Dataset: v2 + structured-hard-sft-4k + structured-5k-mix-sft (concatenated)
  - Adapter saved as LoRA weights
  |
  v
Merge LoRA adapter into full 16-bit weights
  |
  v
DPO (Direct Preference Optimization)
  - Dataset: u-10bei/dpo-dataset-qwen-cot
  |
  v
GawinGowin/qwen3-4b-struct-output  (this model)

SFT Configuration

Parameter	Value
Base model	Qwen/Qwen3-4B-Instruct-2507
Method	QLoRA (4-bit) + Unsloth
Datasets	u-10bei/structured_data_with_cot_dataset_512_v2, daichira/structured-hard-sft-4k, daichira/structured-5k-mix-sft
Max sequence length	1024
Epochs	2
Learning rate	5e-6
LoRA r / alpha	64 / 128
CoT Masking	Enabled (loss applied after `Output:` marker only)

DPO Configuration

Parameter	Value
Method	DPO (TRL) + Unsloth
Dataset	u-10bei/dpo-dataset-qwen-cot
Epochs	1
Learning rate	1e-7
Beta	0.1
LoRA r / alpha	8 / 16 (merged)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "GawinGowin/qwen3-4b-struct-output"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.float16, device_map="auto",
)

Sources & License

Base model: Qwen/Qwen3-4B-Instruct-2507
SFT data: u-10bei/structured_data_with_cot_dataset_512_v2, daichira/structured-hard-sft-4k, daichira/structured-5k-mix-sft — MIT License
DPO data: u-10bei/dpo-dataset-qwen-cot — MIT License
License: Apache 2.0
Compliance: Users must comply with the MIT license of the training datasets and the original terms of use of the base model.

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for GawinGowin/qwen3-4b-struct-output

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(1537)

this model

GawinGowin
/

qwen3-4b-struct-output