StructEval Structured Output LoRA (SFT + DPO)

This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using QLoRA (4-bit, Unsloth) with both SFT and DPO training stages.

This repository contains LoRA adapter weights only. The base model must be loaded separately.

Training Objective

This adapter is trained to improve structured output accuracy (JSON / YAML / XML / TOML / CSV).

The model outputs structured data directly without Chain-of-Thought reasoning, reducing parse failures and token waste.

Training Pipeline

Stage 1: SFT (Supervised Fine-Tuning)

~30,000 samples with assistant-only loss masking

Stage 2: DPO (Direct Preference Optimization)

4,040 preference pairs (chosen vs rejected)
Beta: 0.1
Learning rate: 5e-7

Training Data

SFT Stage:

u-10bei/structured_data_with_cot_dataset_512_v4
u-10bei/structured_data_with_cot_dataset_512_v5
daichira/structured-hard-sft-4k
daichira/structured-5k-mix-sft
daichira/structured-3k-mix-sft
u-10bei/structured_data_with_cot_dataset_512_v2
Rule-based conversion pair augmentation

DPO Stage:

u-10bei/dpo-dataset-qwen-cot (4,040 preference pairs)

Data augmentation (non-LLM, rule-based):

CoT extraction: removed reasoning text, kept structured output only
Rule-based format conversion pairs between all 5 formats
Gap filling for underrepresented task types (Text-to-XML, YAML-to-XML)
Random structure generation for diversity
Rebalancing to match evaluation distribution

Training Configuration

SFT Stage:

Base model: Qwen/Qwen3-4B-Instruct-2507
Method: QLoRA (4-bit) via Unsloth
Max sequence length: 4096
Epochs: 2
Learning rate: 2e-5 (cosine schedule, warmup 10%)
LoRA: r=128, alpha=256
Target modules: q/k/v/o/gate/up/down projections
Batch size: 2 x 8 gradient accumulation = effective 16

DPO Stage:

Epochs: 1
Learning rate: 5e-7
Beta: 0.1
Max length: 1024
Max prompt length: 512

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "perryhsb/structeval-qwen3-4b-lora-dpo"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

Sources & Terms

All training data sourced from permitted datasets listed in the competition rules. Non-LLM augmentation methods only (regex, format parsers, rule-based conversion).

Downloads last month: 2

Model tree for perryhsb/structeval-qwen3-4b-lora-dpo

Base model

Qwen/Qwen3-4B-Instruct-2507

Adapter

(5605)

this model

perryhsb
/

structeval-qwen3-4b-lora-dpo