dpo-qwen-structeval-v2

This model is a fine-tuned version of Qwen/Qwen3-4B-Instruct-2507 using SFT + DPO via the Unsloth library.

This repository contains the full-merged 16-bit weights. No adapter loading is required.

Training Pipeline

SFT (Supervised Fine-Tuning): Base model fine-tuned with LoRA adapter (Sakai0920/qwen3-4b-structured-output-lora-v27)
Merge: SFT adapter merged into base model
DPO (Direct Preference Optimization): Merged model further trained with DPO to improve structured output quality

DPO Training Configuration

Parameter	Value
Base model	Qwen/Qwen3-4B-Instruct-2507
SFT adapter	Sakai0920/qwen3-4b-structured-output-lora-v27
DPO dataset	u-10bei/dpo-dataset-qwen-cot
Epochs	1
Learning rate	5e-07
Beta	0.5
Max sequence length	1024
Max prompt length	512
DPO LoRA	r=16, alpha=32, dropout=0.0
Optimizer	adamw_8bit
Weight decay	0.01
Warmup ratio	0.1
Precision	fp16

Usage

Since this is a merged model, you can use it directly with transformers or with the competition inference code.

With transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_id = "Sakai0920/dpo-qwen-structeval-v2"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.float16,
    device_map="auto"
)

prompt = "Your question here"
messages = [{"role": "user", "content": prompt}]
inputs = tokenizer.apply_chat_template(messages, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda")
outputs = model.generate(inputs, max_new_tokens=512, temperature=0.0)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

With competition inference code

MODEL_SOURCE = "merged"
MERGED_MODEL_ID_OR_PATH = "Sakai0920/dpo-qwen-structeval-v2"

Sources & License

SFT Adapter: Sakai0920/qwen3-4b-structured-output-lora-v27
DPO Dataset: u-10bei/dpo-dataset-qwen-cot
License: Apache 2.0 (follows base model license terms)

Downloads last month: 13

Safetensors

Model size

4B params

Tensor type

F16

Model tree for Sakai0920/dpo-qwen-structeval-v2

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

(908)

this model

Sakai0920
/

dpo-qwen-structeval-v2