qwen3-4b-sft-exp10e

LoRA adapter for structured output generation (JSON, YAML, TOML, XML, CSV) based on Qwen3-4B-Instruct

This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using QLoRA (4-bit quantization).

Note: This repository contains LoRA adapter weights only. The base model must be loaded separately.

🎯 Training Objective

This adapter is trained to improve structured output generation accuracy across multiple formats:

  • JSON - JavaScript Object Notation
  • YAML - YAML Ain't Markup Language
  • TOML - Tom's Obvious, Minimal Language
  • XML - eXtensible Markup Language
  • CSV - Comma-Separated Values

Training Strategy: Loss is applied only to the final assistant output, while intermediate reasoning (Chain-of-Thought) is masked to improve output quality without overfitting to reasoning patterns.

πŸ“Š Performance

Evaluated on public_150.json (松尾研 LLM γ‚³γƒ³γƒšγƒ†γ‚£γ‚·γƒ§γƒ³ 2025):

Format Success Rate Notes
Overall 85.33% -
JSON 98.0% Strong performance
YAML 91.4% Strong performance
CSV 100.0% Excellent
XML 60.0% Room for improvement
TOML 60.0% Needs improvement

Task-Level Analysis

  • Format conversion: Strong on JSON/YAML/CSV conversions
  • Challenges: CSV-to-JSON/XML/YAML, YAML-to-XML, Text-to-TOML
  • Strengths: CSV generation (100%), JSON/YAML parsing and conversion

βš™οΈ Training Configuration

Parameter Value
Base Model Qwen/Qwen3-4B-Instruct-2507
Method QLoRA (4-bit quantization)
Max Sequence Length 512
Epochs 1
Learning Rate 1e-06
LoRA Rank (r) 64
LoRA Alpha 128
Batch Size (per device) 2
Gradient Accumulation 8 steps
Effective Batch Size 16
Target Modules
Optimizer AdamW
Weight Decay 0.05
Warmup Ratio 0.1

πŸš€ Usage

Option 1: Transformers + PEFT (Recommended for CPU/single GPU)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model and adapter
base_model_id = "Qwen/Qwen3-4B-Instruct-2507"
adapter_id = "poprap/qwen3-4b-sft-exp10e"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_id)

# Generate structured output
prompt = "Generate a JSON object representing a book with title, author, and year."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.0,
    do_sample=False,
)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

Option 2: vLLM (Recommended for production/batch inference)

from vllm import LLM, SamplingParams

# Initialize vLLM with LoRA support
llm = LLM(
    model="Qwen/Qwen3-4B-Instruct-2507",
    enable_lora=True,
    max_lora_rank=64,
    gpu_memory_utilization=0.9,
)

# Load LoRA adapter
llm.load_lora_adapter(
    lora_name="structeval-adapter",
    lora_path="poprap/qwen3-4b-sft-exp10e",
)

# Generate with LoRA
sampling_params = SamplingParams(
    temperature=0.0,
    max_tokens=512,
)

prompts = [
    "Generate a JSON object representing a book with title, author, and year.",
    "Convert the following to YAML: {\"name\": \"Alice\", \"age\": 30}",
]

outputs = llm.generate(
    prompts,
    sampling_params,
    lora_request=LoRARequest("structeval-adapter", 1, "poprap/qwen3-4b-sft-exp10e"),
)

for output in outputs:
    print(output.outputs[0].text)

Option 3: CLI Usage (Quick Testing)

# Install dependencies
pip install transformers peft torch

# Run inference
python -c "
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = 'Qwen/Qwen3-4B-Instruct-2507'
adapter = 'poprap/qwen3-4b-sft-exp10e'

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype=torch.float16, device_map='auto')
model = PeftModel.from_pretrained(model, adapter)

prompt = 'Generate a JSON object representing a person with name and age.'
inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.0)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
"

πŸ”§ Installation

# For Transformers + PEFT
pip install transformers peft torch accelerate bitsandbytes

# For vLLM (faster inference)
pip install vllm

# Download adapter
hf repo info poprap/qwen3-4b-sft-exp10e

πŸ’‘ Tips for Best Results

  1. Use temperature=0.0 for deterministic structured output
  2. Set max_tokens appropriately based on expected output length
  3. For batch inference, use vLLM for 10-20x speedup
  4. For production, consider merging adapter with base model:
    merged_model = model.merge_and_unload()
    merged_model.save_pretrained("./merged_model")
    

πŸ† Competition Context

This model was developed for the 松尾研 LLM 講座 2025 ζœ€η΅‚θͺ²ι‘Œγ‚³γƒ³γƒšγƒ†γ‚£γ‚·γƒ§γƒ³ (Matsuo Lab LLM Course 2025 Final Competition).

  • Competition: Main Track - StructEval Benchmark
  • Task: Structured output generation with format conversion
  • Constraint: Qwen3-4B-Instruct-2507 base model only
  • Training Data: Official competition datasets only

πŸ“¦ Training Data & License

Training Datasets

u-10bei/structured_data_with_cot_dataset_512_v2, u-10bei/structured_data_with_cot_dataset_512_v4, u-10bei/structured_data_with_cot_dataset_512_v5, daichira/structured-3k-mix-sft, daichira/structured-hard-sft-4k

License Information

  • Dataset License: Creative Commons Attribution (CC-BY-4.0)
  • Base Model License: Apache 2.0 (Qwen3-4B-Instruct)
  • Adapter License: CC-BY-4.0 (follows training data license)

Important: Users must comply with:

  1. CC-BY-4.0 attribution requirements for training data
  2. Apache 2.0 terms for the Qwen3 base model
  3. Responsible AI guidelines

πŸ› Troubleshooting

OOM (Out of Memory) Errors

  • Use 4-bit quantization: load_in_4bit=True
  • Reduce max_model_len in vLLM
  • Use CPU offloading with device_map="auto"

Slow Inference

  • Switch to vLLM for 10-20x speedup
  • Use batch inference when possible
  • Consider model merging for repeated use

Unexpected Output Format

  • Check that temperature=0.0 for deterministic output
  • Verify prompt format matches training data
  • Ensure adapter is properly loaded

πŸ“š Citation

@misc{qwen3-4b-sft-exp10e,
  title={qwen3-4b-sft-exp10e: LoRA Adapter for Structured Output Generation},
  author={Matsuo Lab LLM Competition 2025},
  year={2026},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/poprap/qwen3-4b-sft-exp10e}},
}

πŸ”— Links

πŸ“§ Contact

For questions or issues, please open an issue on the GitHub repository or contact via HuggingFace discussions.

Downloads last month
14
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for poprap/qwen3-4b-sft-exp10e

Adapter
(3287)
this model