SFT Merged Model (DPO Base)

This model is a full-merged 16-bit version of Qwen/Qwen3-4B-Instruct-2507 with a Supervised Fine-Tuning (SFT) LoRA adapter merged into the base weights.

It serves as the base model for subsequent DPO training.

Training Pipeline

Qwen/Qwen3-4B-Instruct-2507  (base)
  |
  v
SFT via QLoRA (4-bit, Unsloth)
  - Objective: structured output accuracy (JSON / YAML / XML / TOML / CSV)
  - Adapter:   GawinGowin/lora-struct-output  (private)
  |
  v
Merge adapter into full weights -> this model
  |
  v
DPO -> GawinGowin/dpo-struct-output-sfted

SFT Configuration

Parameter Value
Base model Qwen/Qwen3-4B-Instruct-2507
Method QLoRA (4-bit) + Unsloth
Training objective Structured output (JSON / YAML / XML / TOML / CSV)
Dataset u-10bei/structured_data_with_cot_dataset_512_v2
Max sequence length 512
Epochs 1
Learning rate 1e-6
LoRA r / alpha 64 / 128
CoT Masking Enabled (loss applied after Output: marker only)

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# Use the DPO model for inference
model_id = "GawinGowin/dpo-struct-output-sfted"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.float16, device_map="auto",
)

Sources & License

Downloads last month
1
Safetensors
Model size
4B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for GawinGowin/dpo-struct-output-sfted

Finetuned
(1537)
this model

Dataset used to train GawinGowin/dpo-struct-output-sfted