unsloth/Qwen3-4B StructEval-T Optimized (v5 + Hard Mix)
This repository provides a DoRA adapter fine-tuned from unsloth/Qwen3-4B-Instruct-2507 using QLoRA (4-bit, Unsloth).
This repository contains DoRA adapter weights only. The base model must be loaded separately.
Training Objective
This adapter is trained to improve structured output accuracy (JSON / YAML / XML / TOML / CSV).
Loss is applied only to the final assistant output, while intermediate reasoning (Chain-of-Thought) is masked.
Model Selection
- Base Model:
unsloth/Qwen3-4B-Instruct-2507 - Reason: This model was selected for its balance between reasoning capabilities and efficiency on T4 GPUs. The Unsloth version allows for a larger
MAX_SEQ_LEN(1024), which is crucial for processing long structural requirements in the StructEval-T benchmark.
Dataset Strategy (Strategy B) To achieve both high formatting accuracy and complex reasoning, a hybrid dataset approach was used:
- u-10bei/structured_data_with_cot_dataset_512_v5: Used as the "Base of Excellence" for its clean, high-quality instruction-following samples.
- daichira/structured-hard-sft-4k: Integrated to improve robustness against deeply nested structures and long-context constraints.
- Mixing Ratio: 7:3 (v5:Hard) to maintain stability while enhancing peak reasoning performance.
Preprocessing & Data Engineering Unlike simple fine-tuning, the following programmatic enhancements were applied to the training data:
- Marker Unification: Every assistant response was standardized to follow the
Reasoning... Output: {Structured Data}pattern to maximize the effectiveness of CoT masking. - Format Normalization: Applied regex-based cleaning to ensure date formats (ISO-8601) and numerical types are strictly consistent with the StructEval-T evaluation criteria.
- Column Pruning: Stripped unnecessary metadata columns to prevent training noise and focus purely on conversational instruction-following.
Hyperparameters & Optimization
- Method: DoRA (Weight-Decomposed Low-Rank Adaptation) was used instead of standard LoRA to allow for higher-capacity learning of complex syntax (TOML/XML).
- Rank/Alpha: r=64 / alpha=128 to capture the intricate patterns of structured data.
- Learning Rate: 5e-5, optimized for 1 epoch to prevent overfitting while ensuring the model adopts the new 'Output:' marker.
Training Configuration
- Base model: unsloth/Qwen3-4B-Instruct-2507
- Method: QLoRA (4-bit)
- Max sequence length: 1024
- Epochs: 1
- Learning rate: 5e-05
- DoRA: r=32, alpha=64
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base = "unsloth/Qwen3-4B-Instruct-2507"
# adapter = "your_id/your-repo"
adapter = "Shion1124/qwen3-4b-struct-lora"
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
base,
torch_dtype=torch.float16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)
Sources & Terms (IMPORTANT)
Training data: u-10bei/structured_data_with_cot_dataset_512_v5, daichira/structured-hard-sft-4k
Dataset License: MIT License. This dataset is used and distributed under the terms of the MIT License. Compliance: Users must comply with the MIT license (including copyright notice) and the base model's original terms of use.
This model was trained on a custom mixture of the following datasets (Strategy B):
Dataset Licenses: ・The datasets are used under their respective permissive licenses (MIT License / Apache-2.0). ・Users of this model must comply with the original terms provided by the dataset authors.
Model Compliance: This model is a derivative work of Qwen3-4B-Instruct-2507. Users must adhere to the original license terms and usage policies set by the Qwen team.
- Downloads last month
- 70
Model tree for Shion1124/qwen3-4b-struct-lora
Base model
Qwen/Qwen3-4B-Instruct-2507