qwen3-4b-sft-exp10e
LoRA adapter for structured output generation (JSON, YAML, TOML, XML, CSV) based on Qwen3-4B-Instruct
This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using QLoRA (4-bit quantization).
Note: This repository contains LoRA adapter weights only. The base model must be loaded separately.
π― Training Objective
This adapter is trained to improve structured output generation accuracy across multiple formats:
- JSON - JavaScript Object Notation
- YAML - YAML Ain't Markup Language
- TOML - Tom's Obvious, Minimal Language
- XML - eXtensible Markup Language
- CSV - Comma-Separated Values
Training Strategy: Loss is applied only to the final assistant output, while intermediate reasoning (Chain-of-Thought) is masked to improve output quality without overfitting to reasoning patterns.
π Performance
Evaluated on public_150.json (ζΎε°Ύη LLM γ³γ³γγγ£γ·γ§γ³ 2025):
| Format | Success Rate | Notes |
|---|---|---|
| Overall | 85.33% | - |
| JSON | 98.0% | Strong performance |
| YAML | 91.4% | Strong performance |
| CSV | 100.0% | Excellent |
| XML | 60.0% | Room for improvement |
| TOML | 60.0% | Needs improvement |
Task-Level Analysis
- Format conversion: Strong on JSON/YAML/CSV conversions
- Challenges: CSV-to-JSON/XML/YAML, YAML-to-XML, Text-to-TOML
- Strengths: CSV generation (100%), JSON/YAML parsing and conversion
βοΈ Training Configuration
| Parameter | Value |
|---|---|
| Base Model | Qwen/Qwen3-4B-Instruct-2507 |
| Method | QLoRA (4-bit quantization) |
| Max Sequence Length | 512 |
| Epochs | 1 |
| Learning Rate | 1e-06 |
| LoRA Rank (r) | 64 |
| LoRA Alpha | 128 |
| Batch Size (per device) | 2 |
| Gradient Accumulation | 8 steps |
| Effective Batch Size | 16 |
| Target Modules | |
| Optimizer | AdamW |
| Weight Decay | 0.05 |
| Warmup Ratio | 0.1 |
π Usage
Option 1: Transformers + PEFT (Recommended for CPU/single GPU)
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
# Load base model and adapter
base_model_id = "Qwen/Qwen3-4B-Instruct-2507"
adapter_id = "poprap/qwen3-4b-sft-exp10e"
tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(
base_model_id,
torch_dtype=torch.float16,
device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_id)
# Generate structured output
prompt = "Generate a JSON object representing a book with title, author, and year."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=512,
temperature=0.0,
do_sample=False,
)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)
Option 2: vLLM (Recommended for production/batch inference)
from vllm import LLM, SamplingParams
# Initialize vLLM with LoRA support
llm = LLM(
model="Qwen/Qwen3-4B-Instruct-2507",
enable_lora=True,
max_lora_rank=64,
gpu_memory_utilization=0.9,
)
# Load LoRA adapter
llm.load_lora_adapter(
lora_name="structeval-adapter",
lora_path="poprap/qwen3-4b-sft-exp10e",
)
# Generate with LoRA
sampling_params = SamplingParams(
temperature=0.0,
max_tokens=512,
)
prompts = [
"Generate a JSON object representing a book with title, author, and year.",
"Convert the following to YAML: {\"name\": \"Alice\", \"age\": 30}",
]
outputs = llm.generate(
prompts,
sampling_params,
lora_request=LoRARequest("structeval-adapter", 1, "poprap/qwen3-4b-sft-exp10e"),
)
for output in outputs:
print(output.outputs[0].text)
Option 3: CLI Usage (Quick Testing)
# Install dependencies
pip install transformers peft torch
# Run inference
python -c "
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch
base = 'Qwen/Qwen3-4B-Instruct-2507'
adapter = 'poprap/qwen3-4b-sft-exp10e'
tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype=torch.float16, device_map='auto')
model = PeftModel.from_pretrained(model, adapter)
prompt = 'Generate a JSON object representing a person with name and age.'
inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.0)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
"
π§ Installation
# For Transformers + PEFT
pip install transformers peft torch accelerate bitsandbytes
# For vLLM (faster inference)
pip install vllm
# Download adapter
hf repo info poprap/qwen3-4b-sft-exp10e
π‘ Tips for Best Results
- Use temperature=0.0 for deterministic structured output
- Set max_tokens appropriately based on expected output length
- For batch inference, use vLLM for 10-20x speedup
- For production, consider merging adapter with base model:
merged_model = model.merge_and_unload() merged_model.save_pretrained("./merged_model")
π Competition Context
This model was developed for the ζΎε°Ύη LLM θ¬εΊ§ 2025 ζη΅θͺ²ι‘γ³γ³γγγ£γ·γ§γ³ (Matsuo Lab LLM Course 2025 Final Competition).
- Competition: Main Track - StructEval Benchmark
- Task: Structured output generation with format conversion
- Constraint: Qwen3-4B-Instruct-2507 base model only
- Training Data: Official competition datasets only
π¦ Training Data & License
Training Datasets
u-10bei/structured_data_with_cot_dataset_512_v2, u-10bei/structured_data_with_cot_dataset_512_v4, u-10bei/structured_data_with_cot_dataset_512_v5, daichira/structured-3k-mix-sft, daichira/structured-hard-sft-4k
License Information
- Dataset License: Creative Commons Attribution (CC-BY-4.0)
- Base Model License: Apache 2.0 (Qwen3-4B-Instruct)
- Adapter License: CC-BY-4.0 (follows training data license)
Important: Users must comply with:
- CC-BY-4.0 attribution requirements for training data
- Apache 2.0 terms for the Qwen3 base model
- Responsible AI guidelines
π Troubleshooting
OOM (Out of Memory) Errors
- Use 4-bit quantization:
load_in_4bit=True - Reduce
max_model_lenin vLLM - Use CPU offloading with
device_map="auto"
Slow Inference
- Switch to vLLM for 10-20x speedup
- Use batch inference when possible
- Consider model merging for repeated use
Unexpected Output Format
- Check that temperature=0.0 for deterministic output
- Verify prompt format matches training data
- Ensure adapter is properly loaded
π Citation
@misc{qwen3-4b-sft-exp10e,
title={qwen3-4b-sft-exp10e: LoRA Adapter for Structured Output Generation},
author={Matsuo Lab LLM Competition 2025},
year={2026},
publisher={HuggingFace},
howpublished={\url{https://huggingface.co/poprap/qwen3-4b-sft-exp10e}},
}
π Links
- Base Model: Qwen/Qwen3-4B-Instruct-2507
- Competition: ζΎε°Ύη LLM θ¬εΊ§ 2025
- GitHub: Project Repository
π§ Contact
For questions or issues, please open an issue on the GitHub repository or contact via HuggingFace discussions.
- Downloads last month
- 14
Model tree for poprap/qwen3-4b-sft-exp10e
Base model
Qwen/Qwen3-4B-Instruct-2507