SFT Merged Model (DPO Base)

This model is a full-merged 16-bit version of Qwen/Qwen3-4B-Instruct-2507 with a Supervised Fine-Tuning (SFT) LoRA adapter merged into the base weights.

It serves as the base model for subsequent DPO training.

Training Pipeline

Qwen/Qwen3-4B-Instruct-2507  (base)
  |
  v
SFT via QLoRA (4-bit, Unsloth)
  - Objective: structured output accuracy (JSON / YAML / XML / TOML / CSV)
  - Adapter:   GawinGowin/lora-struct-output  (private)
  |
  v
Merge adapter into full weights -> this model
  |
  v
DPO -> GawinGowin/dpo-struct-output-sfted

SFT Configuration

Parameter	Value
Base model	Qwen/Qwen3-4B-Instruct-2507
Method	QLoRA (4-bit) + Unsloth
Training objective	Structured output (JSON / YAML / XML / TOML / CSV)
Dataset	u-10bei/structured_data_with_cot_dataset_512_v2
Max sequence length	512
Epochs	1
Learning rate	1e-6
LoRA r / alpha	64 / 128
CoT Masking	Enabled (loss applied after `Output:` marker only)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Use the DPO model for inference
model_id = "GawinGowin/dpo-struct-output-sfted"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.float16, device_map="auto",
)

Sources & License

Base model: Qwen/Qwen3-4B-Instruct-2507
SFT adapter: GawinGowin/lora-struct-output (private)
Training data: u-10bei/structured_data_with_cot_dataset_512_v2 — MIT License
License: Apache 2.0
Compliance: Users must comply with the MIT license of the training dataset and the original terms of use of the base model.

Downloads last month: 1

Safetensors

Model size

4B params

Tensor type

F16

Model tree for GawinGowin/dpo-struct-output-sfted

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(1537)

this model

GawinGowin
/

dpo-struct-output-sfted