SFT Merged Model (DPO Base)
This model is a full-merged 16-bit version of Qwen/Qwen3-4B-Instruct-2507 with a Supervised Fine-Tuning (SFT) LoRA adapter merged into the base weights.
It serves as the base model for subsequent DPO training.
Training Pipeline
Qwen/Qwen3-4B-Instruct-2507 (base)
|
v
SFT via QLoRA (4-bit, Unsloth)
- Objective: structured output accuracy (JSON / YAML / XML / TOML / CSV)
- Adapter: GawinGowin/lora-struct-output (private)
|
v
Merge adapter into full weights -> this model
|
v
DPO -> GawinGowin/dpo-struct-output-sfted
SFT Configuration
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen3-4B-Instruct-2507 |
| Method | QLoRA (4-bit) + Unsloth |
| Training objective | Structured output (JSON / YAML / XML / TOML / CSV) |
| Dataset | u-10bei/structured_data_with_cot_dataset_512_v2 |
| Max sequence length | 512 |
| Epochs | 1 |
| Learning rate | 1e-6 |
| LoRA r / alpha | 64 / 128 |
| CoT Masking | Enabled (loss applied after Output: marker only) |
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# Use the DPO model for inference
model_id = "GawinGowin/dpo-struct-output-sfted"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype=torch.float16, device_map="auto",
)
Sources & License
- Base model: Qwen/Qwen3-4B-Instruct-2507
- SFT adapter: GawinGowin/lora-struct-output (private)
- Training data: u-10bei/structured_data_with_cot_dataset_512_v2 — MIT License
- License: Apache 2.0
- Compliance: Users must comply with the MIT license of the training dataset and the original terms of use of the base model.
- Downloads last month
- 1
Model tree for GawinGowin/dpo-struct-output-sfted
Base model
Qwen/Qwen3-4B-Instruct-2507