dpo-qwen-structeval-v2

This model is a fine-tuned version of Qwen/Qwen3-4B-Instruct-2507 using SFT + DPO via the Unsloth library.

This repository contains the full-merged 16-bit weights. No adapter loading is required.

Training Pipeline

  1. SFT (Supervised Fine-Tuning): Base model fine-tuned with LoRA adapter (Sakai0920/qwen3-4b-structured-output-lora-v27)
  2. Merge: SFT adapter merged into base model
  3. DPO (Direct Preference Optimization): Merged model further trained with DPO to improve structured output quality

DPO Training Configuration

Parameter Value
Base model Qwen/Qwen3-4B-Instruct-2507
SFT adapter Sakai0920/qwen3-4b-structured-output-lora-v27
DPO dataset u-10bei/dpo-dataset-qwen-cot
Epochs 1
Learning rate 5e-07
Beta 0.5
Max sequence length 1024
Max prompt length 512
DPO LoRA r=16, alpha=32, dropout=0.0
Optimizer adamw_8bit
Weight decay 0.01
Warmup ratio 0.1
Precision fp16

Usage

Since this is a merged model, you can use it directly with transformers or with the competition inference code.

With transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Sakai0920/dpo-qwen-structeval-v2"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

prompt = "Your question here"
messages = [{"role": "user", "content": prompt}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.0)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With competition inference code

MODEL_SOURCE = "merged"
MERGED_MODEL_ID_OR_PATH = "Sakai0920/dpo-qwen-structeval-v2"

Sources & License

Downloads last month
13
Safetensors
Model size
4B params
Tensor type
F16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Sakai0920/dpo-qwen-structeval-v2

Finetuned
(908)
this model

Dataset used to train Sakai0920/dpo-qwen-structeval-v2