qwen3-4b-structured-output-lora-v3 (FIXED)

This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using standard PeFT + Transformers with 4-bit quantization.

โš ๏ธ This repository contains LoRA adapter weights only. The base model must be loaded separately.

Version 3: Critical Template Alignment Fix

This version fixes the critical template mismatch that caused v1/v2 to output explanatory text:

Key Fixes

  1. Template Alignment: add_generation_prompt=True (matches vLLM inference)
  2. User-Ending Prompts: Training prompts end with user message (not assistant)
  3. Response-Only Loss: Loss applied only to response part, prompt is masked
  4. Proper Learning Rate: 2e-06 (stronger than v2's 5e-07)

Why v1/v2 Failed

  • v1/v2: Used add_generation_prompt=False during training
  • vLLM: Uses add_generation_prompt=True during inference
  • Result: Model saw different prompt formats โ†’ output explanatory text

v3 Results

  • Training loss: ~1.12-1.37 (vs v1/v2's ~1.96)
  • Expected: <1% explanatory text rate (vs v1's 28.7%, v2's 45.3%)

Training Configuration

  • Base model: Qwen/Qwen3-4B-Instruct-2507
  • Method: LoRA with 4-bit quantization (standard PeFT + Transformers)
  • Max sequence length: 512
  • Epochs: 1
  • Learning rate: 2e-06 (proper learning strength)
  • LoRA parameters: r=64, alpha=128
  • Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
  • Batch size: 2 ร— 8 (gradient accumulation) = 16 effective
  • Training loss: ~1.12-1.37 (final)
  • Training time: ~14 minutes on RTX 5090

Dataset

  • Source: u-10bei/structured_data_with_cot_dataset_512_v2
  • Preprocessing: Removed "Approach:" sections and "Output:" markers
  • Size: 3,933 examples โ†’ 3,736 train / 197 validation

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "astom-M/qwen3-4b-structured-output-lora-clean"

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
    base,
    torch_dtype=torch.float16,
    device_map="auto",
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter)

# For vLLM inference (recommended):
# Use the standard inference notebook provided by competition organizers

Technical Details

Template Alignment Fix

Training (v3):

# Prompt: system + user messages only
prompt_text = tokenizer.apply_chat_template(
    prompt_messages,
    add_generation_prompt=True  # โ† KEY FIX
)
# Response: assistant content (raw structured data)
# Labels: Mask prompt part, only learn response part

Inference (vLLM):

# Exactly matches training format
tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True  # โ† Now aligned!
)

Sources & License

  • Base Model: Qwen/Qwen3-4B-Instruct-2507 (Apache 2.0)
  • Training Data: u-10bei/structured_data_with_cot_dataset_512_v2 (preprocessed)
  • LoRA Adapter: Apache 2.0 (same as base model)

Notes

  • Trained for Matsuo Institute LLM Course Main Competition (StructEval-T)
  • Version 3: Fixed template alignment - critical fix for structured output
  • Designed to output clean structured data without explanatory text
  • Best used with temperature=0.0 for deterministic outputs

Framework Versions

  • PEFT 0.18.1
  • Transformers 4.56.2
  • PyTorch 2.10.0+cu128
Downloads last month
4
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for astom-M/qwen3-4b-structured-output-lora-clean

Adapter
(400)
this model