qwen3-4b-structured-output-lora

This repository provides a LoRA adapter fine-tuned from unsloth/Qwen3-4B-Instruct-2507 using QLoRA (4-bit, Unsloth). This repository contains LoRA adapter weights only. The base model must be loaded separately.

Training Objective

This adapter is trained to improve structured output accuracy (JSON / YAML / XML / TOML / CSV). Loss is applied only to the final assistant output, while intermediate reasoning (Chain-of-Thought) is masked. Output supervision uses mode after_marker with markers: </think>, Output:, OUTPUT:, Final:, Answer:, Result:, Response:.

Training Configuration

  • Base model: unsloth/Qwen3-4B-Instruct-2507
  • Method: QLoRA (4-bit, Unsloth)
  • Max sequence length: 2048
  • Epochs: 1
  • Learning rate: 1e-6
  • LoRA: r=64, alpha=128

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "unsloth/Qwen3-4B-Instruct-2507"
adapter = "uchkw/qwen3-4b-structured-output-lora"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

Training Data / Sources & License (IMPORTANT)

FIRST TRAINING: This adapter was trained on a mix of the following datasets. The total number of samples used for the first training phase was 9,094.

SECOND TRAINING: Then the adapter was continued training on a rule-based JSON→TOML augmentation dataset derived from the u-10bei structured_data_with_cot_dataset_* series (train + validation) with inline-table expansion. TOML validity was checked via tomllib, and no LLM was used for generation. The total number of samples used for the second training phase was 7,146. The u-10bei structured_data_with_cot_dataset_* series datasets used for the second training phase are as follows:

THIRD TRAINING: The adapter was then further trained on a mixed dataset of 20,000 samples that includes StructEval-style Text-to-TOML synthetic data and invalid-to-valid repair data.

  • Base SFT data was extracted from the datasets used in the first training phase.
  • Additional TOML-focused data was extracted from the datasets used in the second training phase.
  • The StructEval-style Text-to-TOML set and invalid-to-valid repair set were generated locally by scripts, with no LLM used for data generation.
  • TOML validity was checked with tomllib, and structurally invalid samples were excluded.
  • Final training used an 80:15:5 mix ratio (base : StructEval-style synthetic : repair), with ratio-adjusted counts of 16,000 base, 3,000 StructEval-style synthetic, and 1,000 repair (20,000 total).
  • Samples exceeding max_seq_length=2048 were excluded.
  • Licenses: The training data is subject to MIT (u-10bei) and CC-BY-4.0 (daichira). Use and redistribution must comply with both.
  • Attribution: CC-BY-4.0 requires appropriate credit to the dataset author. When distributing this model or derivatives, provide attribution for daichira/structured-hard-sft-4k as required by CC-BY-4.0.
  • Compliance: Users must also comply with the base model's terms of use.
Downloads last month
24
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for uchkw/qwen3-4b-structured-output-lora

Adapter
(117)
this model