Qwen3-8B AIMO3 Tool-Integrated Reasoning

Model Summary

A LoRA fine-tuned version of Qwen-8B trained for tool-integrated reasoning on the AIMO3 competition dataset (generated by GPT-OSS-120B). The LoRA adapters have been merged into the base model and saved in SafeTensors format for straightforward deployment.

Property Details
Base Model Qwen-8B
Fine-tuning Method LoRA (merged)
Format SafeTensors (BF16)
Parameters ~8B
Disk Size ~16GB
Max Context 8192 tokens

Model Details

LoRA Configuration

Hyperparameter Value
Rank (r) 16
Alpha 32
Dropout 0.05
Bias none
Target Modules q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj

Training Hyperparameters

Hyperparameter Value
Precision BFloat16 (no quantization)
Epochs 2
Steps 8750 (~1 epoch)
Per-device Batch Size 2
Gradient Accumulation Steps 8 (effective batch: 16)
Learning Rate 2e-4
LR Scheduler Cosine with warmup
Warmup Ratio 0.03
Weight Decay 0.01
Max Gradient Norm 1.0
Max Sequence Length 8192
Optimizer AdamW (Fused)

Hardware & Infrastructure

  • Platform: Kaggle
  • GPU: Single NVIDIA H100 (80GB)
  • Attention: Flash Attention 2
  • Optimizations: Gradient checkpointing, TF32, fused optimizer

Training Data

Supported column names:

  • Input: problem, question, input, prompt
  • Output: solution, answer, output, response, completion

Instruction Format

Training uses a ChatML-style format:

<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant
{response}<|im_end|>

Training Loss

The model is trained for 8750 steps (~ 1 epoch) before stopping. Below are the train and validation loss curves for the entire training session.

Training Loss Plot


Usage

Load the Model

Since the LoRA adapters are already merged, PEFT is not required:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "tensorhydra/qwen-8b-aimo3-reasoning",
    trust_remote_code=True,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

tokenizer = AutoTokenizer.from_pretrained(
    "tensorhydra/qwen-8b-aimo3-reasoning",
    trust_remote_code=True
)

Inference

prompt = "Solve this problem: What is 2 + 2?"

formatted_prompt = f"user\n{prompt}\nassistant\n"

inputs = tokenizer(formatted_prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
    **inputs,
    max_new_tokens=512,
    temperature=0.7,
    top_p=0.9,
    do_sample=True
)

response = tokenizer.decode(outputs[0], skip_special_tokens=False)
print(response)

Batch Inference

prompts = [
    "Solve: 15 + 27 = ?",
    "What is the derivative of x^2?",
    "Calculate the area of a circle with radius 5"
]

formatted_prompts = [
    f"user\n{p}\nassistant\n"
    for p in prompts
]

inputs = tokenizer(formatted_prompts, return_tensors="pt", padding=True).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, temperature=0.7, do_sample=True)

for response in tokenizer.batch_decode(outputs, skip_special_tokens=False):
    print(response)
    print("-" * 80)

Quantized Inference (Lower VRAM)

# 8-bit (~8GB VRAM)
model = AutoModelForCausalLM.from_pretrained(
    "tensorhydra/qwen-8b-aimo3-reasoning",
    load_in_8bit=True,
    device_map="auto"
)

# 4-bit (~4GB VRAM)
model = AutoModelForCausalLM.from_pretrained(
    "tensorhydra/qwen-8b-aimo3-reasoning",
    load_in_4bit=True,
    device_map="auto"
)

Memory Requirements

Mode VRAM
BF16 (full) ~16GB
8-bit quantized ~8GB
4-bit quantized ~4GB

Repository Structure

model/
β”œβ”€β”€ config.json
β”œβ”€β”€ generation_config.json
β”œβ”€β”€ model.safetensors.index.json
β”œβ”€β”€ model-00001-of-0000X.safetensors
β”œβ”€β”€ ...
β”œβ”€β”€ tokenizer_config.json
β”œβ”€β”€ tokenizer.json
└── special_tokens_map.json

Intended Use

  • Mathematical reasoning and problem solving
  • Tool-integrated step-by-step reasoning
  • Educational and research applications
  • Production deployment (merged model, no PEFT dependency)

Limitations

  • Fine-tuned on a narrow reasoning domain; may not generalize well to other tasks
  • Hard context limit of 8192 tokens
  • Performance is bounded by the quality and distribution of the synthetic training data
  • Full merged model requires ~16GB storage (vs. ~100–200MB for LoRA adapters alone)

Links


Citation

@misc{qwen-lora-aimo3,
  title   = {Qwen-8B LoRA Fine-tuned for Tool-Integrated Reasoning},
  author  = {tensorhydra},
  year    = {2025},
  howpublished = {Kaggle Model Hub},
  note    = {Merged LoRA model in SafeTensors format}
}

Acknowledgements

  • Base model: Qwen-8B by Alibaba Cloud
  • Training frameworks: Hugging Face Transformers & PEFT
  • Dataset synthesis: GPT-OSS-120B
  • Serialization: SafeTensors
  • Training platform: Kaggle (H100 GPU)

License

This model inherits the license of the base Qwen-8B model. Please refer to the Qwen license terms before use.

Downloads last month
429
Safetensors
Model size
8B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support