Qwen3-4B Structured Output (SFT + DPO)
This model is a full-merged version of Qwen/Qwen3-4B-Instruct-2507 fine-tuned for structured output tasks (JSON / YAML / XML / TOML / CSV) via a two-stage pipeline: SFT (QLoRA) → DPO.
Training Pipeline
Qwen/Qwen3-4B-Instruct-2507 (base)
|
v
SFT via QLoRA (4-bit, Unsloth)
- Dataset: v2 + structured-hard-sft-4k + structured-5k-mix-sft (concatenated)
- Adapter saved as LoRA weights
|
v
Merge LoRA adapter into full 16-bit weights
|
v
DPO (Direct Preference Optimization)
- Dataset: u-10bei/dpo-dataset-qwen-cot
|
v
GawinGowin/qwen3-4b-struct-output (this model)
SFT Configuration
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen3-4B-Instruct-2507 |
| Method | QLoRA (4-bit) + Unsloth |
| Datasets | u-10bei/structured_data_with_cot_dataset_512_v2, daichira/structured-hard-sft-4k, daichira/structured-5k-mix-sft |
| Max sequence length | 1024 |
| Epochs | 2 |
| Learning rate | 5e-6 |
| LoRA r / alpha | 64 / 128 |
| CoT Masking | Enabled (loss applied after Output: marker only) |
DPO Configuration
| Parameter | Value |
|---|---|
| Method | DPO (TRL) + Unsloth |
| Dataset | u-10bei/dpo-dataset-qwen-cot |
| Epochs | 1 |
| Learning rate | 1e-7 |
| Beta | 0.1 |
| LoRA r / alpha | 8 / 16 (merged) |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "GawinGowin/qwen3-4b-struct-output"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype=torch.float16, device_map="auto",
)
Sources & License
- Base model: Qwen/Qwen3-4B-Instruct-2507
- SFT data: u-10bei/structured_data_with_cot_dataset_512_v2, daichira/structured-hard-sft-4k, daichira/structured-5k-mix-sft — MIT License
- DPO data: u-10bei/dpo-dataset-qwen-cot — MIT License
- License: Apache 2.0
- Compliance: Users must comply with the MIT license of the training datasets and the original terms of use of the base model.
Model tree for GawinGowin/qwen3-4b-struct-output
Base model
Qwen/Qwen3-4B-Instruct-2507