LoRA: StructEval-T (Qwen3-4B-Instruct-2507) SFT+DPO (sftdpo_v1)
This repository contains a LoRA adapter (PEFT) for StructEval-T style structured output tasks.
It is trained as SFT → DPO on top of Qwen/Qwen3-4B-Instruct-2507.
⚠️ This repo is NOT a full merged model.
Load the base model first, then apply this adapter.
Model Details
- Adapter type: LoRA (PEFT)
- Base model: Qwen/Qwen3-4B-Instruct-2507 (license: Apache-2.0)
- What this adapter aims to improve:
- Better compliance to structured output constraints (JSON / schema / format-sensitive tasks)
- Better preference alignment via DPO on a preference dataset
Training Data
- DPO preference dataset (main):
u-10bei/dpo-dataset-qwen-cot - SFT starting point (adapter):
RinnRinnmini/lora_structeval_t_qwen3_4b_sft_v1(This adapter was used as initialization before DPO.)
Training Procedure
Stage 1) SFT (initialization)
- Start from
Qwen/Qwen3-4B-Instruct-2507 - Apply SFT LoRA adapter
RinnRinnmini/lora_structeval_t_qwen3_4b_sft_v1as the initialization point
Stage 2) DPO (this adapter)
- Trainer: TRL
DPOTrainer - Framework: Transformers + PEFT (+ Unsloth where applicable)
- Key hyperparameters
- epochs: 1
- learning rate: 1e-07
- beta: 0.1
- max_length: 1024
- max_prompt_length: 512
- per_device_train_batch_size: 1
- gradient_accumulation_steps: 8
- optimizer: OptimizerNames.ADAMW_8BIT
How to Use
Transformers + PEFT (recommended)
from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch
BASE_ID = "Qwen/Qwen3-4B-Instruct-2507"
# BASE_ID = "unsloth/Qwen3-4B-Instruct-2507"
ADAPTER_ID = "RinnRinnmini/qwen3-4b-structeval-sftdpo_v3-adapter" # this repo (LoRA adapter)
tok = AutoTokenizer.from_pretrained(BASE_ID, trust_remote_code=True, use_fast=True)
if tok.pad_token is None:
tok.pad_token = tok.eos_token
base = AutoModelForCausalLM.from_pretrained(
BASE_ID,
device_map="auto",
torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
trust_remote_code=True,
)
model = PeftModel.from_pretrained(base, ADAPTER_ID)
model.eval()
messages = [
{"role": "user", "content": "Return a JSON with keys a,b,c and integer values."}
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=256,
do_sample=False,
pad_token_id=tok.eos_token_id,
)
print(tok.decode(out[0], skip_special_tokens=True))
Limitations
- This is an adapter-only repository. You must load the base model separately.
Model tree for RinnRinnmini/lora_structeval_t_qwen3_4b_sft_v4
Base model
Qwen/Qwen3-4B-Instruct-2507