StructEval Structured Output LoRA (SFT + DPO)
This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using QLoRA (4-bit, Unsloth) with both SFT and DPO training stages.
This repository contains LoRA adapter weights only. The base model must be loaded separately.
Training Objective
This adapter is trained to improve structured output accuracy (JSON / YAML / XML / TOML / CSV).
The model outputs structured data directly without Chain-of-Thought reasoning, reducing parse failures and token waste.
Training Pipeline
Stage 1: SFT (Supervised Fine-Tuning)
- ~30,000 samples with assistant-only loss masking
Stage 2: DPO (Direct Preference Optimization)
- 4,040 preference pairs (chosen vs rejected)
- Beta: 0.1
- Learning rate: 5e-7
Training Data
SFT Stage:
- u-10bei/structured_data_with_cot_dataset_512_v4
- u-10bei/structured_data_with_cot_dataset_512_v5
- daichira/structured-hard-sft-4k
- daichira/structured-5k-mix-sft
- daichira/structured-3k-mix-sft
- u-10bei/structured_data_with_cot_dataset_512_v2
- Rule-based conversion pair augmentation
DPO Stage:
- u-10bei/dpo-dataset-qwen-cot (4,040 preference pairs)
Data augmentation (non-LLM, rule-based):
- CoT extraction: removed reasoning text, kept structured output only
- Rule-based format conversion pairs between all 5 formats
- Gap filling for underrepresented task types (Text-to-XML, YAML-to-XML)
- Random structure generation for diversity
- Rebalancing to match evaluation distribution
Training Configuration
SFT Stage:
- Base model: Qwen/Qwen3-4B-Instruct-2507
- Method: QLoRA (4-bit) via Unsloth
- Max sequence length: 4096
- Epochs: 2
- Learning rate: 2e-5 (cosine schedule, warmup 10%)
- LoRA: r=128, alpha=256
- Target modules: q/k/v/o/gate/up/down projections
- Batch size: 2 x 8 gradient accumulation = effective 16
DPO Stage:
- Epochs: 1
- Learning rate: 5e-7
- Beta: 0.1
- Max length: 1024
- Max prompt length: 512
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "perryhsb/structeval-qwen3-4b-lora-dpo"
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
base,
torch_dtype=torch.float16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
Sources & Terms
All training data sourced from permitted datasets listed in the competition rules. Non-LLM augmentation methods only (regex, format parsers, rule-based conversion).
- Downloads last month
- 21
Model tree for perryhsb/structeval-qwen3-4b-lora-dpo
Base model
Qwen/Qwen3-4B-Instruct-2507