qwen3-4b-struct-dpo-v16-merged

This repository contains a two-stage fine-tuned model based on deepkick/qwen3-4b-struct-dpo-v14-b0.10-L2048-merged.

  • Stage 1 (SFT, payload-only): improves format stability / parseability by training on payloads extracted after Output:.
  • Stage 2 (DPO, pure-fence-only): penalizes Markdown code fences / wrappers while keeping payload content unchanged.

This repo (deepkick/qwen3-4b-struct-dpo-v16-merged) contains the final merged weights (no adapter loading required).

Training data

  • Dataset: u-10bei/structured_data_with_cot_dataset_512_v2
  • Preprocessing: mechanical extraction of payload after Output: (no LLM generation)

Training configuration

Stage 1: payload-only SFT

  • Epochs: 1
  • Learning rate: 2e-7
  • (Optional artifact) LoRA adapter: deepkick/qwen3-4b-payload-sft-v16

Stage 2: pure-fence-only DPO

  • Epochs: 1
  • Learning rate: 2e-7
  • Beta: 0.1
  • Max sequence length: 2048
  • Preference rule: chosen = payload-only, rejected = fenced(payload) (payload itself unchanged)

Usage

Transformers (merged model)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "deepkick/qwen3-4b-struct-dpo-v16-merged"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

messages = [{"role":"user","content":"Your prompt here"}]
inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
).to(model.device)

out = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(out[0], skip_special_tokens=True))

Notes

  • This model is optimized for structured outputs and to avoid code fences / extra wrappers where the evaluation expects raw payloads.
  • Please follow the base model license and dataset terms.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for deepkick/qwen3-4b-payload-sft-v16

Dataset used to train deepkick/qwen3-4b-payload-sft-v16