qwen3-4b-structured-output-lora-v3 (FIXED)

This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using standard PeFT + Transformers with 4-bit quantization.

⚠️ This repository contains LoRA adapter weights only. The base model must be loaded separately.

Version 3: Critical Template Alignment Fix

This version fixes the critical template mismatch that caused v1/v2 to output explanatory text:

Key Fixes

Template Alignment: add_generation_prompt=True (matches vLLM inference)
User-Ending Prompts: Training prompts end with user message (not assistant)
Response-Only Loss: Loss applied only to response part, prompt is masked
Proper Learning Rate: 2e-06 (stronger than v2's 5e-07)

Why v1/v2 Failed

v1/v2: Used add_generation_prompt=False during training
vLLM: Uses add_generation_prompt=True during inference
Result: Model saw different prompt formats → output explanatory text

v3 Results

Training loss: ~1.12-1.37 (vs v1/v2's ~1.96)
Expected: <1% explanatory text rate (vs v1's 28.7%, v2's 45.3%)

Training Configuration

Base model: Qwen/Qwen3-4B-Instruct-2507
Method: LoRA with 4-bit quantization (standard PeFT + Transformers)
Max sequence length: 512
Epochs: 1
Learning rate: 2e-06 (proper learning strength)
LoRA parameters: r=64, alpha=128
Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Batch size: 2 × 8 (gradient accumulation) = 16 effective
Training loss: ~1.12-1.37 (final)
Training time: ~14 minutes on RTX 5090

Dataset

Source: u-10bei/structured_data_with_cot_dataset_512_v2
Preprocessing: Removed "Approach:" sections and "Output:" markers
Size: 3,933 examples → 3,736 train / 197 validation

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "astom-M/qwen3-4b-structured-output-lora-clean"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter)

# For vLLM inference (recommended):
# Use the standard inference notebook provided by competition organizers

Technical Details

Template Alignment Fix

Training (v3):

# Prompt: system + user messages only
prompt_text = tokenizer.apply_chat_template(
    prompt_messages,
    add_generation_prompt=True  # ← KEY FIX
)
# Response: assistant content (raw structured data)
# Labels: Mask prompt part, only learn response part

Inference (vLLM):

# Exactly matches training format
tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True  # ← Now aligned!
)

Sources & License

Base Model: Qwen/Qwen3-4B-Instruct-2507 (Apache 2.0)
Training Data: u-10bei/structured_data_with_cot_dataset_512_v2 (preprocessed)
LoRA Adapter: Apache 2.0 (same as base model)

Notes

Trained for Matsuo Institute LLM Course Main Competition (StructEval-T)
Version 3: Fixed template alignment - critical fix for structured output
Designed to output clean structured data without explanatory text
Best used with temperature=0.0 for deterministic outputs

Framework Versions

PEFT 0.18.1
Transformers 4.56.2
PyTorch 2.10.0+cu128

Downloads last month: 4

Model tree for astom-M/qwen3-4b-structured-output-lora-clean

Base model

Qwen/Qwen3-4B-Instruct-2507

Finetuned

unsloth/Qwen3-4B-Instruct-2507

Adapter

(400)

this model