Text Generation
PEFT
Safetensors
English
qlora
lora
structured-output

unsloth/Qwen3-4B StructEval-T Optimized (v5 + Hard Mix)

This repository provides a DoRA adapter fine-tuned from unsloth/Qwen3-4B-Instruct-2507 using QLoRA (4-bit, Unsloth).

This repository contains DoRA adapter weights only. The base model must be loaded separately.

Training Objective

This adapter is trained to improve structured output accuracy (JSON / YAML / XML / TOML / CSV).

Loss is applied only to the final assistant output, while intermediate reasoning (Chain-of-Thought) is masked.

Model Selection

  • Base Model: unsloth/Qwen3-4B-Instruct-2507
  • Reason: This model was selected for its balance between reasoning capabilities and efficiency on T4 GPUs. The Unsloth version allows for a larger MAX_SEQ_LEN (1024), which is crucial for processing long structural requirements in the StructEval-T benchmark.

Dataset Strategy (Strategy B) To achieve both high formatting accuracy and complex reasoning, a hybrid dataset approach was used:

  1. u-10bei/structured_data_with_cot_dataset_512_v5: Used as the "Base of Excellence" for its clean, high-quality instruction-following samples.
  2. daichira/structured-hard-sft-4k: Integrated to improve robustness against deeply nested structures and long-context constraints.
  • Mixing Ratio: 7:3 (v5:Hard) to maintain stability while enhancing peak reasoning performance.

Preprocessing & Data Engineering Unlike simple fine-tuning, the following programmatic enhancements were applied to the training data:

  • Marker Unification: Every assistant response was standardized to follow the Reasoning... Output: {Structured Data} pattern to maximize the effectiveness of CoT masking.
  • Format Normalization: Applied regex-based cleaning to ensure date formats (ISO-8601) and numerical types are strictly consistent with the StructEval-T evaluation criteria.
  • Column Pruning: Stripped unnecessary metadata columns to prevent training noise and focus purely on conversational instruction-following.

Hyperparameters & Optimization

  • Method: DoRA (Weight-Decomposed Low-Rank Adaptation) was used instead of standard LoRA to allow for higher-capacity learning of complex syntax (TOML/XML).
  • Rank/Alpha: r=64 / alpha=128 to capture the intricate patterns of structured data.
  • Learning Rate: 5e-5, optimized for 1 epoch to prevent overfitting while ensuring the model adopts the new 'Output:' marker.

Training Configuration

  • Base model: unsloth/Qwen3-4B-Instruct-2507
  • Method: QLoRA (4-bit)
  • Max sequence length: 1024
  • Epochs: 1
  • Learning rate: 5e-05
  • DoRA: r=32, alpha=64

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "unsloth/Qwen3-4B-Instruct-2507"
# adapter = "your_id/your-repo"
adapter = "Shion1124/qwen3-4b-struct-lora"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

Sources & Terms (IMPORTANT)

Training data: u-10bei/structured_data_with_cot_dataset_512_v5, daichira/structured-hard-sft-4k

Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.

This model was trained on a custom mixture of the following datasets (Strategy B):

  1. u-10bei/structured_data_with_cot_dataset_512_v5
  2. daichira/structured-hard-sft-4k

Dataset Licenses: ・The datasets are used under their respective permissive licenses (MIT License / Apache-2.0). ・Users of this model must comply with the original terms provided by the dataset authors.

Model Compliance: This model is a derivative work of Qwen3-4B-Instruct-2507. Users must adhere to the original license terms and usage policies set by the Qwen team.

Downloads last month
70
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Shion1124/qwen3-4b-struct-lora

Adapter
(150)
this model

Datasets used to train Shion1124/qwen3-4b-struct-lora