dpo-qwen-structeval-v2
This model is a fine-tuned version of Qwen/Qwen3-4B-Instruct-2507 using SFT + DPO via the Unsloth library.
This repository contains the full-merged 16-bit weights. No adapter loading is required.
Training Pipeline
- SFT (Supervised Fine-Tuning): Base model fine-tuned with LoRA adapter (Sakai0920/qwen3-4b-structured-output-lora-v27)
- Merge: SFT adapter merged into base model
- DPO (Direct Preference Optimization): Merged model further trained with DPO to improve structured output quality
DPO Training Configuration
| Parameter | Value |
|---|---|
| Base model | Qwen/Qwen3-4B-Instruct-2507 |
| SFT adapter | Sakai0920/qwen3-4b-structured-output-lora-v27 |
| DPO dataset | u-10bei/dpo-dataset-qwen-cot |
| Epochs | 1 |
| Learning rate | 5e-07 |
| Beta | 0.5 |
| Max sequence length | 1024 |
| Max prompt length | 512 |
| DPO LoRA | r=16, alpha=32, dropout=0.0 |
| Optimizer | adamw_8bit |
| Weight decay | 0.01 |
| Warmup ratio | 0.1 |
| Precision | fp16 |
Usage
Since this is a merged model, you can use it directly with transformers or with the competition inference code.
With transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_id = "Sakai0920/dpo-qwen-structeval-v2"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.float16,
device_map="auto"
)
prompt = "Your question here"
messages = [{"role": "user", "content": prompt}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.0)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
With competition inference code
MODEL_SOURCE = "merged"
MERGED_MODEL_ID_OR_PATH = "Sakai0920/dpo-qwen-structeval-v2"
Sources & License
- SFT Adapter: Sakai0920/qwen3-4b-structured-output-lora-v27
- DPO Dataset: u-10bei/dpo-dataset-qwen-cot
- License: Apache 2.0 (follows base model license terms)
- Downloads last month
- 13
Model tree for Sakai0920/dpo-qwen-structeval-v2
Base model
Qwen/Qwen3-4B-Instruct-2507