qwen3-4b-structured-output-lora-v3 (FIXED)
This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using standard PeFT + Transformers with 4-bit quantization.
โ ๏ธ This repository contains LoRA adapter weights only. The base model must be loaded separately.
Version 3: Critical Template Alignment Fix
This version fixes the critical template mismatch that caused v1/v2 to output explanatory text:
Key Fixes
- Template Alignment:
add_generation_prompt=True(matches vLLM inference) - User-Ending Prompts: Training prompts end with user message (not assistant)
- Response-Only Loss: Loss applied only to response part, prompt is masked
- Proper Learning Rate: 2e-06 (stronger than v2's 5e-07)
Why v1/v2 Failed
- v1/v2: Used
add_generation_prompt=Falseduring training - vLLM: Uses
add_generation_prompt=Trueduring inference - Result: Model saw different prompt formats โ output explanatory text
v3 Results
- Training loss: ~1.12-1.37 (vs v1/v2's ~1.96)
- Expected: <1% explanatory text rate (vs v1's 28.7%, v2's 45.3%)
Training Configuration
- Base model: Qwen/Qwen3-4B-Instruct-2507
- Method: LoRA with 4-bit quantization (standard PeFT + Transformers)
- Max sequence length: 512
- Epochs: 1
- Learning rate: 2e-06 (proper learning strength)
- LoRA parameters: r=64, alpha=128
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- Batch size: 2 ร 8 (gradient accumulation) = 16 effective
- Training loss: ~1.12-1.37 (final)
- Training time: ~14 minutes on RTX 5090
Dataset
- Source: u-10bei/structured_data_with_cot_dataset_512_v2
- Preprocessing: Removed "Approach:" sections and "Output:" markers
- Size: 3,933 examples โ 3,736 train / 197 validation
Usage
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base = "Qwen/Qwen3-4B-Instruct-2507"
adapter = "astom-M/qwen3-4b-structured-output-lora-clean"
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(
base,
torch_dtype=torch.float16,
device_map="auto",
trust_remote_code=True,
)
model = PeftModel.from_pretrained(model, adapter)
# For vLLM inference (recommended):
# Use the standard inference notebook provided by competition organizers
Technical Details
Template Alignment Fix
Training (v3):
# Prompt: system + user messages only
prompt_text = tokenizer.apply_chat_template(
prompt_messages,
add_generation_prompt=True # โ KEY FIX
)
# Response: assistant content (raw structured data)
# Labels: Mask prompt part, only learn response part
Inference (vLLM):
# Exactly matches training format
tokenizer.apply_chat_template(
messages,
add_generation_prompt=True # โ Now aligned!
)
Sources & License
- Base Model: Qwen/Qwen3-4B-Instruct-2507 (Apache 2.0)
- Training Data: u-10bei/structured_data_with_cot_dataset_512_v2 (preprocessed)
- LoRA Adapter: Apache 2.0 (same as base model)
Notes
- Trained for Matsuo Institute LLM Course Main Competition (StructEval-T)
- Version 3: Fixed template alignment - critical fix for structured output
- Designed to output clean structured data without explanatory text
- Best used with temperature=0.0 for deterministic outputs
Framework Versions
- PEFT 0.18.1
- Transformers 4.56.2
- PyTorch 2.10.0+cu128
- Downloads last month
- 4
Model tree for astom-M/qwen3-4b-structured-output-lora-clean
Base model
Qwen/Qwen3-4B-Instruct-2507 Finetuned
unsloth/Qwen3-4B-Instruct-2507