StructEval Structured Output LoRA (SFT + DPO)

This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using QLoRA (4-bit, Unsloth) with both SFT and DPO training stages.

This repository contains LoRA adapter weights only. The base model must be loaded separately.

Training Objective

This adapter is trained to improve structured output accuracy (JSON / YAML / XML / TOML / CSV).

The model outputs structured data directly without Chain-of-Thought reasoning, reducing parse failures and token waste.

Training Pipeline

Stage 1: SFT (Supervised Fine-Tuning)

  • ~30,000 samples with assistant-only loss masking

Stage 2: DPO (Direct Preference Optimization)

  • 4,040 preference pairs (chosen vs rejected)
  • Beta: 0.1
  • Learning rate: 5e-7

Training Data

SFT Stage:

  • u-10bei/structured_data_with_cot_dataset_512_v4
  • u-10bei/structured_data_with_cot_dataset_512_v5
  • daichira/structured-hard-sft-4k
  • daichira/structured-5k-mix-sft
  • daichira/structured-3k-mix-sft
  • u-10bei/structured_data_with_cot_dataset_512_v2
  • Rule-based conversion pair augmentation

DPO Stage:

  • u-10bei/dpo-dataset-qwen-cot (4,040 preference pairs)

Data augmentation (non-LLM, rule-based):

  • CoT extraction: removed reasoning text, kept structured output only
  • Rule-based format conversion pairs between all 5 formats
  • Gap filling for underrepresented task types (Text-to-XML, YAML-to-XML)
  • Random structure generation for diversity
  • Rebalancing to match evaluation distribution

Training Configuration

SFT Stage:

  • Base model: Qwen/Qwen3-4B-Instruct-2507
  • Method: QLoRA (4-bit) via Unsloth
  • Max sequence length: 4096
  • Epochs: 2
  • Learning rate: 2e-5 (cosine schedule, warmup 10%)
  • LoRA: r=128, alpha=256
  • Target modules: q/k/v/o/gate/up/down projections
  • Batch size: 2 x 8 gradient accumulation = effective 16

DPO Stage:

  • Epochs: 1
  • Learning rate: 5e-7
  • Beta: 0.1
  • Max length: 1024
  • Max prompt length: 512

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "perryhsb/structeval-qwen3-4b-lora-dpo"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

Sources & Terms

All training data sourced from permitted datasets listed in the competition rules. Non-LLM augmentation methods only (regex, format parsers, rule-based conversion).

Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for perryhsb/structeval-qwen3-4b-lora-dpo

Adapter
(1840)
this model

Datasets used to train perryhsb/structeval-qwen3-4b-lora-dpo