qwen3-4b-structured-output-lora

This repository provides a LoRA adapter fine-tuned from unsloth/Qwen3-4B-Instruct-2507 using QLoRA (4-bit, Unsloth). This repository contains LoRA adapter weights only. The base model must be loaded separately.

Training Objective

This adapter is trained to improve structured output accuracy (JSON / YAML / XML / TOML / CSV). Loss is applied only to the final assistant output, while intermediate reasoning (Chain-of-Thought) is masked. Output supervision uses mode after_marker with markers: </think>, Output:, OUTPUT:, Final:, Answer:, Result:, Response:.

Training Configuration

Base model: unsloth/Qwen3-4B-Instruct-2507
Method: QLoRA (4-bit, Unsloth)
Max sequence length: 2048
Epochs: 1
Learning rate: 1e-6
LoRA: r=64, alpha=128

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "unsloth/Qwen3-4B-Instruct-2507"
adapter = "uchkw/qwen3-4b-structured-output-lora"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter)

Training Data / Sources & License (IMPORTANT)

FIRST TRAINING: This adapter was trained on a mix of the following datasets. The total number of samples used for the first training phase was 9,094.

Dataset	License
u-10bei/structured_data_with_cot_dataset_512_v5	MIT
daichira/structured-hard-sft-4k	CC-BY-4.0

SECOND TRAINING: Then the adapter was continued training on a rule-based JSON→TOML augmentation dataset derived from the u-10bei structured_data_with_cot_dataset_* series (train + validation) with inline-table expansion. TOML validity was checked via tomllib, and no LLM was used for generation. The total number of samples used for the second training phase was 7,146. The u-10bei structured_data_with_cot_dataset_* series datasets used for the second training phase are as follows:

Dataset	License
u-10bei/structured_data_with_cot_dataset_512_v2	MIT
u-10bei/structured_data_with_cot_dataset_512_v4	MIT
u-10bei/structured_data_with_cot_dataset_512_v5	MIT
u-10bei/structured_data_with_cot_dataset_512	MIT
u-10bei/structured_data_with_cot_dataset_v2	MIT
u-10bei/structured_data_with_cot_dataset	MIT

THIRD TRAINING: The adapter was then further trained on a mixed dataset of 20,000 samples that includes StructEval-style Text-to-TOML synthetic data and invalid-to-valid repair data.

Base SFT data was extracted from the datasets used in the first training phase.
Additional TOML-focused data was extracted from the datasets used in the second training phase.
The StructEval-style Text-to-TOML set and invalid-to-valid repair set were generated locally by scripts, with no LLM used for data generation.
TOML validity was checked with tomllib, and structurally invalid samples were excluded.
Final training used an 80:15:5 mix ratio (base : StructEval-style synthetic : repair), with ratio-adjusted counts of 16,000 base, 3,000 StructEval-style synthetic, and 1,000 repair (20,000 total).
Samples exceeding max_seq_length=2048 were excluded.

Licenses: The training data is subject to MIT (u-10bei) and CC-BY-4.0 (daichira). Use and redistribution must comply with both.
Attribution: CC-BY-4.0 requires appropriate credit to the dataset author. When distributing this model or derivatives, provide attribution for daichira/structured-hard-sft-4k as required by CC-BY-4.0.
Compliance: Users must also comply with the base model's terms of use.

Downloads last month: 1

Model tree for uchkw/qwen3-4b-structured-output-lora

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

unsloth/Qwen3-4B-Instruct-2507

Adapter

(408)

this model