qwen3-4b-sft-exp10f

LoRA adapter for structured output generation (JSON, YAML, TOML, XML, CSV) based on Qwen3-4B-Instruct

This repository provides a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 using QLoRA (4-bit quantization).

Note: This repository contains LoRA adapter weights only. The base model must be loaded separately.

🎯 Training Objective

This adapter is trained to improve structured output generation accuracy across multiple formats:

  • JSON - JavaScript Object Notation
  • YAML - YAML Ain't Markup Language
  • TOML - Tom's Obvious, Minimal Language
  • XML - eXtensible Markup Language
  • CSV - Comma-Separated Values

Training Strategy: Loss is applied only to the final assistant output, while intermediate reasoning (Chain-of-Thought) is masked to improve output quality without overfitting to reasoning patterns.

πŸ“Š Performance

Evaluated on public_150.json (松尾研 LLM γ‚³γƒ³γƒšγƒ†γ‚£γ‚·γƒ§γƒ³ 2025):

Format Success Rate Notes
Overall 82.67% -
JSON 98.0% Strong performance
YAML 85.7% Strong performance
CSV 100.0% Excellent
XML 60.0% Room for improvement
TOML 52.0% Needs improvement

Task-Level Analysis

  • Format conversion: Strong on JSON/YAML/CSV conversions
  • Challenges: CSV-to-JSON/XML/YAML, YAML-to-XML, Text-to-TOML
  • Strengths: CSV generation (100%), JSON/YAML parsing and conversion

βš™οΈ Training Configuration

Parameter Value
Base Model Qwen/Qwen3-4B-Instruct-2507
Method QLoRA (4-bit quantization)
Max Sequence Length 512
Epochs 1
Learning Rate 1e-06
LoRA Rank (r) 64
LoRA Alpha 128
Batch Size (per device) 2
Gradient Accumulation 8 steps
Effective Batch Size 16
Target Modules
Optimizer AdamW
Weight Decay 0.05
Warmup Ratio 0.1

πŸš€ Usage

Option 1: Transformers + PEFT (Recommended for CPU/single GPU)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Load base model and adapter
base_model_id = "Qwen/Qwen3-4B-Instruct-2507"
adapter_id = "poprap/qwen3-4b-sft-exp10f"

tokenizer = AutoTokenizer.from_pretrained(base_model_id)
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)
model = PeftModel.from_pretrained(model, adapter_id)

# Generate structured output
prompt = "Generate a JSON object representing a book with title, author, and year."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.0,
    do_sample=False,
)
result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

Option 2: vLLM (Recommended for production/batch inference)

from vllm import LLM, SamplingParams

# Initialize vLLM with LoRA support
llm = LLM(
    model="Qwen/Qwen3-4B-Instruct-2507",
    enable_lora=True,
    max_lora_rank=64,
    gpu_memory_utilization=0.9,
)

# Load LoRA adapter
llm.load_lora_adapter(
    lora_name="structeval-adapter",
    lora_path="poprap/qwen3-4b-sft-exp10f",
)

# Generate with LoRA
sampling_params = SamplingParams(
    temperature=0.0,
    max_tokens=512,
)

prompts = [
    "Generate a JSON object representing a book with title, author, and year.",
    "Convert the following to YAML: {\"name\": \"Alice\", \"age\": 30}",
]

outputs = llm.generate(
    prompts,
    sampling_params,
    lora_request=LoRARequest("structeval-adapter", 1, "poprap/qwen3-4b-sft-exp10f"),
)

for output in outputs:
    print(output.outputs[0].text)

Option 3: CLI Usage (Quick Testing)

# Install dependencies
pip install transformers peft torch

# Run inference
python -c "
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

base = 'Qwen/Qwen3-4B-Instruct-2507'
adapter = 'poprap/qwen3-4b-sft-exp10f'

tokenizer = AutoTokenizer.from_pretrained(base)
model = AutoModelForCausalLM.from_pretrained(base, torch_dtype=torch.float16, device_map='auto')
model = PeftModel.from_pretrained(model, adapter)

prompt = 'Generate a JSON object representing a person with name and age.'
inputs = tokenizer(prompt, return_tensors='pt').to(model.device)
outputs = model.generate(**inputs, max_new_tokens=256, temperature=0.0)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
"

πŸ”§ Installation

# For Transformers + PEFT
pip install transformers peft torch accelerate bitsandbytes

# For vLLM (faster inference)
pip install vllm

# Download adapter
hf repo info poprap/qwen3-4b-sft-exp10f

πŸ’‘ Tips for Best Results

  1. Use temperature=0.0 for deterministic structured output
  2. Set max_tokens appropriately based on expected output length
  3. For batch inference, use vLLM for 10-20x speedup
  4. For production, consider merging adapter with base model:
    merged_model = model.merge_and_unload()
    merged_model.save_pretrained("./merged_model")
    

πŸ† Competition Context

This model was developed for the 松尾研 LLM 講座 2025 ζœ€η΅‚θͺ²ι‘Œγ‚³γƒ³γƒšγƒ†γ‚£γ‚·γƒ§γƒ³ (Matsuo Lab LLM Course 2025 Final Competition).

  • Competition: Main Track - StructEval Benchmark
  • Task: Structured output generation with format conversion
  • Constraint: Qwen3-4B-Instruct-2507 base model only
  • Training Data: Official competition datasets only

πŸ“¦ Training Data & License

Training Datasets

u-10bei/structured_data_with_cot_dataset_512_v2, u-10bei/structured_data_with_cot_dataset_512_v4, u-10bei/structured_data_with_cot_dataset_512, daichira/structured-3k-mix-sft, daichira/structured-hard-sft-4k

License Information

  • Dataset License: Creative Commons Attribution (CC-BY-4.0)
  • Base Model License: Apache 2.0 (Qwen3-4B-Instruct)
  • Adapter License: CC-BY-4.0 (follows training data license)

Important: Users must comply with:

  1. CC-BY-4.0 attribution requirements for training data
  2. Apache 2.0 terms for the Qwen3 base model
  3. Responsible AI guidelines

πŸ› Troubleshooting

OOM (Out of Memory) Errors

  • Use 4-bit quantization: load_in_4bit=True
  • Reduce max_model_len in vLLM
  • Use CPU offloading with device_map="auto"

Slow Inference

  • Switch to vLLM for 10-20x speedup
  • Use batch inference when possible
  • Consider model merging for repeated use

Unexpected Output Format

  • Check that temperature=0.0 for deterministic output
  • Verify prompt format matches training data
  • Ensure adapter is properly loaded

πŸ“š Citation

@misc{qwen3-4b-sft-exp10f,
  title={qwen3-4b-sft-exp10f: LoRA Adapter for Structured Output Generation},
  author={Matsuo Lab LLM Competition 2025},
  year={2026},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/poprap/qwen3-4b-sft-exp10f}},
}

πŸ”— Links

πŸ“§ Contact

For questions or issues, please open an issue on the GitHub repository or contact via HuggingFace discussions.

Downloads last month
6
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for poprap/qwen3-4b-sft-exp10f

Adapter
(4305)
this model