Qwen3-4B Structured Data Expert (Exp13 - DPO with System Prompt)

This model is a fine-tuned version of Qwen/Qwen3-4B-Instruct-2507 using Direct Preference Optimization (DPO).

This repository contains a LoRA adapter trained for structured data generation tasks (JSON, YAML, TOML, XML, CSV, etc.).

Key Feature

Training and inference formats are fully aligned by embedding the system prompt into DPO training data, which significantly improves output quality.

Training Configuration

Parameter Value
Base model Qwen/Qwen3-4B-Instruct-2507 + SFT (Exp5)
Method DPO (Direct Preference Optimization)
Dataset u-10bei/dpo-dataset-qwen-cot
LoRA rank (r) 16
LoRA alpha 32
Learning rate 5e-7
Epochs 2
Batch size 4 (grad accum: 2)
Beta 0.1
Max length 1024
Max prompt length 512
Optimizer AdamW
Warmup ratio 0.1
Seed 3407

System Prompt (used at inference)

You are a structured data expert. Output the requested format directly without any explanation, preamble, or markdown code blocks. Do not write ```json, ```yaml, ```toml, ```xml, ```csv or similar. Output only the raw structured data.

Key Improvements over baseline

  • System prompt embedded in DPO training: Training and inference formats are fully consistent
  • Clean chosen responses: Only the structured data portion extracted (no code blocks, no preamble)
  • Code block suppression: 0% code block usage at inference (vs ~70% in base DPO)

Inference Example

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

BASE_MODEL_ID = "Qwen/Qwen3-4B-Instruct-2507"
ADAPTER_PATH = "tenyyprn/qwen3-4b-structeval-exp13"

SYSTEM_PROMPT = (
    "You are a structured data expert. "
    "Output the requested format directly without any explanation, "
    "preamble, or markdown code blocks. "
    "Do not write ```json, ```yaml, ```toml, ```xml, ```csv or similar. "
    "Output only the raw structured data."
)

tokenizer = AutoTokenizer.from_pretrained(BASE_MODEL_ID, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(BASE_MODEL_ID, torch_dtype=torch.float16, device_map="auto")
model = PeftModel.from_pretrained(model, ADAPTER_PATH)
model = model.merge_and_unload()
model.eval()

messages = [
    {"role": "system", "content": SYSTEM_PROMPT},
    {"role": "user", "content": "Convert to JSON: name=Alice, age=30, city=Tokyo"},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tokenizer(text, return_tensors="pt").to("cuda")
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Citations

@inproceedings{rafailov2023direct,
    title        = {{Direct Preference Optimization: Your Language Model is Secretly a Reward Model}},
    author       = {Rafael Rafailov and Archit Sharma and Eric Mitchell and Christopher D. Manning and Stefano Ermon and Chelsea Finn},
    year         = 2023,
    booktitle    = {Advances in Neural Information Processing Systems 36},
    url          = {http://papers.nips.cc/paper_files/paper/2023/hash/a85b405ed65c6477a4fe8302b5e06ce7-Abstract-Conference.html},
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tenyyprn/qwen3-4b-structeval-exp13

Adapter
(5269)
this model

Dataset used to train tenyyprn/qwen3-4b-structeval-exp13