LoRA: StructEval-T (Qwen3-4B-Instruct-2507) SFT+DPO (sftdpo_v1)

This repository contains a LoRA adapter (PEFT) for StructEval-T style structured output tasks.
It is trained as SFT → DPO on top of Qwen/Qwen3-4B-Instruct-2507.

⚠️ This repo is NOT a full merged model.
Load the base model first, then apply this adapter.


Model Details

  • Adapter type: LoRA (PEFT)
  • Base model: Qwen/Qwen3-4B-Instruct-2507 (license: Apache-2.0)
  • What this adapter aims to improve:
    • Better compliance to structured output constraints (JSON / schema / format-sensitive tasks)
    • Better preference alignment via DPO on a preference dataset

Training Data

  • DPO preference dataset (main): u-10bei/dpo-dataset-qwen-cot
  • SFT starting point (adapter): RinnRinnmini/lora_structeval_t_qwen3_4b_sft_v1 (This adapter was used as initialization before DPO.)

Training Procedure

Stage 1) SFT (initialization)

  • Start from Qwen/Qwen3-4B-Instruct-2507
  • Apply SFT LoRA adapter RinnRinnmini/lora_structeval_t_qwen3_4b_sft_v1 as the initialization point

Stage 2) DPO (this adapter)

  • Trainer: TRL DPOTrainer
  • Framework: Transformers + PEFT (+ Unsloth where applicable)
  • Key hyperparameters
    • epochs: 1
    • learning rate: 1e-07
    • beta: 0.1
    • max_length: 1024
    • max_prompt_length: 512
    • per_device_train_batch_size: 1
    • gradient_accumulation_steps: 8
    • optimizer: OptimizerNames.ADAMW_8BIT

How to Use

Transformers + PEFT (recommended)

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel
import torch

BASE_ID = "Qwen/Qwen3-4B-Instruct-2507"
# BASE_ID = "unsloth/Qwen3-4B-Instruct-2507"

ADAPTER_ID = "RinnRinnmini/qwen3-4b-structeval-sftdpo_v3-adapter"  # this repo (LoRA adapter)

tok = AutoTokenizer.from_pretrained(BASE_ID, trust_remote_code=True, use_fast=True)
if tok.pad_token is None:
    tok.pad_token = tok.eos_token

base = AutoModelForCausalLM.from_pretrained(
    BASE_ID,
    device_map="auto",
    torch_dtype=torch.float16 if torch.cuda.is_available() else torch.float32,
    trust_remote_code=True,
)

model = PeftModel.from_pretrained(base, ADAPTER_ID)
model.eval()

messages = [
    {"role": "user", "content": "Return a JSON with keys a,b,c and integer values."}
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

inputs = tok(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=False,
        pad_token_id=tok.eos_token_id,
    )

print(tok.decode(out[0], skip_special_tokens=True))

Limitations

  • This is an adapter-only repository. You must load the base model separately.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for RinnRinnmini/lora_structeval_t_qwen3_4b_sft_v4

Adapter
(1865)
this model

Dataset used to train RinnRinnmini/lora_structeval_t_qwen3_4b_sft_v4