qwen3-4b-struct-dpo-v16-merged
This repository contains a two-stage fine-tuned model based on deepkick/qwen3-4b-struct-dpo-v14-b0.10-L2048-merged.
- Stage 1 (SFT, payload-only): improves format stability / parseability by training on payloads extracted after
Output:. - Stage 2 (DPO, pure-fence-only): penalizes Markdown code fences / wrappers while keeping payload content unchanged.
This repo (deepkick/qwen3-4b-struct-dpo-v16-merged) contains the final merged weights (no adapter loading required).
Training data
- Dataset: u-10bei/structured_data_with_cot_dataset_512_v2
- Preprocessing: mechanical extraction of payload after
Output:(no LLM generation)
Training configuration
Stage 1: payload-only SFT
- Epochs: 1
- Learning rate: 2e-7
- (Optional artifact) LoRA adapter: deepkick/qwen3-4b-payload-sft-v16
Stage 2: pure-fence-only DPO
- Epochs: 1
- Learning rate: 2e-7
- Beta: 0.1
- Max sequence length: 2048
- Preference rule: chosen = payload-only, rejected = fenced(payload) (payload itself unchanged)
Usage
Transformers (merged model)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "deepkick/qwen3-4b-struct-dpo-v16-merged"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
messages = [{"role":"user","content":"Your prompt here"}]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
).to(model.device)
out = model.generate(**inputs, max_new_tokens=512)
print(tokenizer.decode(out[0], skip_special_tokens=True))
Notes
- This model is optimized for structured outputs and to avoid code fences / extra wrappers where the evaluation expects raw payloads.
- Please follow the base model license and dataset terms.
Model tree for deepkick/qwen3-4b-payload-sft-v16
Base model
Qwen/Qwen3-4B-Instruct-2507