Text Generation
PEFT
Safetensors
English
qlora
lora
structured-output

unsloth/Qwen3-4B StructEval-T Optimized (v5 + Hard Mix)

This repository provides a DoRA adapter fine-tuned from unsloth/Qwen3-4B-Instruct-2507 using QLoRA (4-bit, Unsloth).

This repository contains DoRA adapter weights only. The base model must be loaded separately.

Training Objective

This adapter is trained to improve structured output accuracy (JSON / YAML / XML / TOML / CSV).

Loss is applied only to the final assistant output, while intermediate reasoning (Chain-of-Thought) is masked.

Model Selection

  • Base Model: unsloth/Qwen3-4B-Instruct-2507
  • Reason: This model was selected for its balance between reasoning capabilities and efficiency on T4 GPUs. The Unsloth version allows for a larger MAX_SEQ_LEN (1024), which is crucial for processing long structural requirements in the StructEval-T benchmark.

Dataset Strategy (Strategy B) To achieve both high formatting accuracy and complex reasoning, a hybrid dataset approach was used:

  1. u-10bei/structured_data_with_cot_dataset_512_v5: Used as the "Base of Excellence" for its clean, high-quality instruction-following samples.
  2. daichira/structured-hard-sft-4k: Integrated to improve robustness against deeply nested structures and long-context constraints.
  • Mixing Ratio: 7:3 (v5:Hard) to maintain stability while enhancing peak reasoning performance.

Preprocessing & Data Engineering Unlike simple fine-tuning, the following programmatic enhancements were applied to the training data:

  • Marker Unification: Every assistant response was standardized to follow the Reasoning... Output: {Structured Data} pattern to maximize the effectiveness of CoT masking.
  • Format Normalization: Applied regex-based cleaning to ensure date formats (ISO-8601) and numerical types are strictly consistent with the StructEval-T evaluation criteria.
  • Column Pruning: Stripped unnecessary metadata columns to prevent training noise and focus purely on conversational instruction-following.

Hyperparameters & Optimization

  • Method: DoRA (Weight-Decomposed Low-Rank Adaptation) was used instead of standard LoRA to allow for higher-capacity learning of complex syntax (TOML/XML).
  • Rank/Alpha: r=64 / alpha=128 to capture the intricate patterns of structured data.
  • Learning Rate: 5e-5, optimized for 1 epoch to prevent overfitting while ensuring the model adopts the new 'Output:' marker.

Training Configuration

  • Base model: unsloth/Qwen3-4B-Instruct-2507
  • Method: QLoRA (4-bit)
  • Max sequence length: 1024
  • Epochs: 1
  • Learning rate: 5e-05
  • DoRA: r=32, alpha=64

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "unsloth/Qwen3-4B-Instruct-2507"
# adapter = "your_id/your-repo"
adapter = "Shion1124/qwen3-4b-struct-lora"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

Sources & Terms (IMPORTANT)

Training data: u-10bei/structured_data_with_cot_dataset_512_v5, daichira/structured-hard-sft-4k

Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.

This model was trained on a custom mixture of the following datasets (Strategy B):

  1. u-10bei/structured_data_with_cot_dataset_512_v5
  2. daichira/structured-hard-sft-4k

Dataset Licenses: ・The datasets are used under their respective permissive licenses (MIT License / Apache-2.0). ・Users of this model must comply with the original terms provided by the dataset authors.

Model Compliance: This model is a derivative work of Qwen3-4B-Instruct-2507. Users must adhere to the original license terms and usage policies set by the Qwen team.

Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Shion1124/qwen3-4b-struct-lora

Adapter
(396)
this model

Datasets used to train Shion1124/qwen3-4b-struct-lora