Qwen3-4B Struct-Eval v3 (LoRA Adapter)

This repository contains a LoRA adapter fine-tuned from Qwen/Qwen3-4B-Instruct-2507 for structured output generation tasks (JSON, YAML, XML, TOML, CSV).

Training Objective

Fine-tuned for structured output generation with improved template alignment and response-focused learning.

Key Features (v3)

Template Alignment: Uses add_generation_prompt=True during training to match inference behavior
Response-Only Loss: Masks prompt tokens (-100), learns only from response tokens
Optimized Hyperparameters: Learning rate 2e-6 with QLoRA (4-bit)

Training Configuration

Parameter	Value
Base Model	`Qwen/Qwen3-4B-Instruct-2507`
Method	QLoRA (4-bit quantization)
Dataset	`u-10bei/structured_data_with_cot_dataset_512_v2`
Max Sequence Length	512
Epochs	1
Learning Rate	2e-6
LoRA Rank (r)	64
LoRA Alpha	128
LoRA Dropout	0.0
LoRA Target Modules	q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
Batch Size	2 (per device)
Gradient Accumulation	16
Effective Batch Size	32
Warmup Ratio	0.1
Weight Decay	0.05

Usage

Load Model with Adapter

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

# Model IDs
base_model_id = "Qwen/Qwen3-4B-Instruct-2507"
adapter_id = "astom-M/qwen3-4b-struct-eval-v3-colab"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_id)

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)

# Apply LoRA adapter
model = PeftModel.from_pretrained(model, adapter_id)

Inference Example

# Prepare input
messages = [
    {"role": "system", "content": "You are a helpful assistant that generates structured outputs."},
    {"role": "user", "content": "Generate a JSON object with name and age fields for a person named Alice who is 25 years old."}
]

# Apply chat template
input_text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True  # Important: matches training
)

# Tokenize
inputs = tokenizer(input_text, return_tensors="pt").to(model.device)

# Generate
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    do_sample=True,
    top_p=0.9,
)

# Decode
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Training Details

Dataset

Source: u-10bei/structured_data_with_cot_dataset_512_v2
Format: Chain-of-Thought (CoT) reasoning + Structured output
Split: 95% train / 5% validation

Hardware

GPU: NVIDIA T4 (16GB VRAM) on Google Colab
Training Time: Approximately 20-30 minutes

Competition Rule Compliance

This model was trained in full compliance with competition rules:

Uses only permitted base model: Qwen3-4B-Instruct-2507
Uses only official training dataset: u-10bei/structured_data_with_cot_dataset_512_v2
No LLM-based data generation or augmentation
No external APIs or tools used during inference
No RAG or tool-use capabilities added
No manual editing of inference outputs

Citation

If you use this adapter, please cite the original Qwen3 model:

@article{qwen3-2507,
  title={Qwen3 Technical Report},
  author={Qwen Team},
  year={2025},
  url={https://huggingface.co/Qwen/Qwen3-4B-Instruct-2507}
}