Model Details
- Base model: Qwen/Qwen2.5-3B-Instruct
- Fine-tuning method: LoRA (PEFT)
- LoRA rank (r): 16
- LoRA alpha: 32
- Target modules: q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj
- LoRA dropout: 0.04
- Trainable parameters: ~0.59% of total (≈18.35M params)
- Training precision: bfloat16
- Optimizer: adamw_torch
- Learning rate: 1.8e-4 (cosine schedule + 6% warmup)
- Epochs: 3
- Effective batch size: 24 (per_device=4 × accum=6)
- Gradient checkpointing: enabled (non-reentrant)
- Context length used: 3072 tokens
- Hardware: AMD Instinct MI300X (192 GB HBM3) – ROCm
- Training dataset: Xerv-AI/GRAD (full train split, graduate-level math & proofs)
- Training date: March 2026
- Upload date: March 2026
Intended Use
This is a specialized 3B reasoning model fine-tuned to produce extremely consistent nested structured Chain-of-Thought output using the following rigid tag format:
<plan> … high-level decomposition … </plan>
<reasoning>
<step index="1"> … </step>
<step index="2"> … <verify>optional verification</verify> … </step>
…
</reasoning>
<conclusion>\boxed{final answer}</conclusion>
It is designed for:
- Mathematical proof generation
- Step-by-step scientific reasoning
- Competition-style problem solving (AMC, AIME, IMO shortlist level)
- Educational tools that require verifiable, auditable reasoning traces
- Agents / tool-use pipelines that parse structured reasoning
Prompting Recommendation
Strongly recommended inference prompt (copy-paste this):
<|im_start|>system
You are ReasonBorn – rigorous scientific & mathematical reasoner.
Respond **only** using this exact nested structure:
<plan>…</plan>
<reasoning> containing multiple <step index="…"> tags (with optional <verify> sub-tags)
<conclusion>\boxed{…}</conclusion>
Never write text outside the tags. Never skip tags.
<|im_end|>
<|im_start|>user
{question}
<|im_end|>
<|im_start|>assistant
Lower temperature (0.1–0.25) + top_p ≈ 0.90–0.95 usually gives the cleanest structure.
Training Script
import os
import gc
import re
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
import torch
from huggingface_hub import login, HfApi
from datasets import load_dataset, Dataset
from transformers import (
AutoModelForCausalLM,
AutoTokenizer,
TrainingArguments,
Trainer,
DataCollatorForLanguageModeling,
)
from peft import LoraConfig, get_peft_model
os.environ["TOKENIZERS_PARALLELISM"] = "false"
MODEL_ID = "Qwen/Qwen2.5-3B"
REPO_NAME = "rb-qwen3b-16ds-lora"
SAVE_DIR = "./rb-qwen-16ds-lora-final"
MAX_CTX = 512
EPOCHS = 1.15
LR = 2.5e-4
LORA_R = 16
LORA_ALPHA = 32
BATCH_SIZE = 48
GRAD_ACCUM = 2
WORKERS = 12
DATA_MIX = {
"NuminaMath": {"path": "AI-MO/NuminaMath-CoT", "max_samples": 60000, "split": "train"},
"OrcaMath": {"path": "microsoft/orca-math-word-problems-200k", "max_samples": 60000, "split": "train"},
"UltraMath-Conv": {"path": "openbmb/UltraData-Math", "config": "UltraData-Math-L3-Conversation-Synthetic","max_samples": 50000, "split": "train"},
"GSM8K": {"path": "openai/gsm8k", "config": "main", "max_samples": 7473, "split": "train"},
"AI2_ARC": {"path": "allenai/ai2_arc", "config": "ARC-Challenge", "max_samples": 7500, "split": "train"},
"SciQ": {"path": "sciq", "max_samples": 11679, "split": "train"},
"OpenBookQA": {"path": "openbookqa", "max_samples": 4957, "split": "train"},
"GPQA": {"path": "Idavidrein/gpqa", "config": "gpqa_diamond", "max_samples": 198, "split": "train"},
"ChemistryQA": {"path": "avaliev/ChemistryQA", "max_samples": 4000, "split": "train"},
"HLE": {"path": "cais/hle", "max_samples": 2700, "split": "test"},
"GRAD": {"path": "Xerv-AI/GRAD", "max_samples": 1933, "split": "train"},
}
def format_example(ex):
try:
q = str(ex.get("question") or ex.get("problem") or ex.get("prompt") or "").strip()
s = str(ex.get("answer") or ex.get("solution") or ex.get("response") or "").strip()
if len(q) < 5 or len(s) < 5:
return None
boxed = re.search(r'\\boxed\{(.*?)\}', s, re.DOTALL)
ans = boxed.group(1).strip() if boxed else s[:80]
reasoning = re.sub(r'\\boxed\{.*?\}', '', s, flags=re.DOTALL).strip()
steps = [l.strip() for l in reasoning.split('\n') if len(l.strip()) > 8][:5]
xml = "<plan>Decompose→reason→verify→conclude.</plan>\n<reasoning>\n"
for i, step in enumerate(steps, 1):
v = "<verify>ok</verify>" if i == len(steps) else ""
xml += f'<step index="{i}">{step}{v}</step>\n'
xml += f"</reasoning>\n<conclusion>\\boxed{{{ans}}}</conclusion>"
sys_p = "You are ReasonBorn. Output only: <plan>,<reasoning><step>...</step></reasoning>,<conclusion>\\boxed{}."
return {"text": (
f"<|im_start|>system\n{sys_p}<|im_end|>\n"
f"<|im_start|>user\n{q}<|im_end|>\n"
f"<|im_start|>assistant\n{xml}<|im_end|>"
)}
except Exception:
return None
def load_one(name, cfg):
examples = []
kwargs = {"split": cfg["split"], "trust_remote_code": True}
if "config" in cfg:
kwargs["name"] = cfg["config"]
try:
ds = load_dataset(cfg["path"], **kwargs)
if len(ds) > cfg["max_samples"]:
ds = ds.select(range(cfg["max_samples"]))
for ex in ds:
r = format_example(ex)
if r:
examples.append(r)
return name, examples, "ok"
except Exception:
pass
try:
ds = load_dataset(cfg["path"], streaming=True, **kwargs)
for ex in ds:
if len(examples) >= cfg["max_samples"]:
break
r = format_example(ex)
if r:
examples.append(r)
return name, examples, "stream"
except Exception:
return name, [], "failed"
login()
all_ex = []
with ThreadPoolExecutor(max_workers=6) as pool:
futs = {pool.submit(load_one, n, c): n for n, c in DATA_MIX.items()}
for fut in as_completed(futs):
n, exs, status = fut.result()
all_ex.extend(exs)
train_ds = Dataset.from_list(all_ex).shuffle(seed=42)
del all_ex
gc.collect()
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
tokenizer.padding_side = "right"
tokenized = train_ds.map(
lambda b: tokenizer(b["text"], truncation=True, max_length=MAX_CTX, padding=False),
batched=True, batch_size=4000, num_proc=16,
remove_columns=["text"],
)
tokenized = tokenized.filter(lambda x: len(x["input_ids"]) >= 8, num_proc=16)
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
torch_dtype=torch.bfloat16,
low_cpu_mem_usage=True,
trust_remote_code=True,
attn_implementation="eager",
)
model = model.to("cuda")
torch.cuda.synchronize()
model.gradient_checkpointing_enable(gradient_checkpointing_kwargs={"use_reentrant": False})
model.enable_input_require_grads()
model = get_peft_model(model, LoraConfig(
r=LORA_R,
lora_alpha=LORA_ALPHA,
target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"],
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
))
collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)
args = TrainingArguments(
output_dir = "./chk",
num_train_epochs = EPOCHS,
per_device_train_batch_size = BATCH_SIZE,
gradient_accumulation_steps = GRAD_ACCUM,
gradient_checkpointing = True,
optim = "adamw_torch_fused",
learning_rate = LR,
bf16 = True,
fp16 = False,
logging_steps = 25,
save_strategy = "steps",
save_steps = 500,
save_total_limit = 2,
warmup_ratio = 0.05,
lr_scheduler_type = "cosine",
weight_decay = 0.01,
max_grad_norm = 0.5,
dataloader_num_workers = WORKERS,
dataloader_pin_memory = True,
dataloader_prefetch_factor = 4,
report_to = "none",
remove_unused_columns = True,
)
trainer = Trainer(
model=model,
args=args,
train_dataset=tokenized,
data_collator=collator,
)
trainer.train()
os.makedirs(SAVE_DIR, exist_ok=True)
trainer.save_model(SAVE_DIR)
tokenizer.save_pretrained(SAVE_DIR)
Performance Notes (March 2026 observations)
After only 3 epochs on GRAD:
✅ Very strong format adherence when strongly prompted
✅ Good proof structure and logical flow on number theory, algebra, basic inequalities
✅ Often includes verification steps (especially on last step)
⚠️ Format can still degrade on very long / multi-part questions without strong system prompt
⚠️ Generalization to non-math domains is limited (this is a math-first fine-tune)
⚠️ Weaker zero-shot format obedience compared to multi-dataset versions
Training Hyperparameters Summary
| Parameter | Value |
|---|---|
| Epochs | 3 |
| Per-device batch size | 4 |
| Gradient accumulation steps | 6 |
| Global batch size | 24 |
| Learning rate | 1.8 × 10⁻⁴ |
| LR scheduler | cosine |
| Warmup ratio | 0.06 |
| Weight decay | 0.015 |
| Max grad norm | 0.8 |
| Optimizer | adamw_torch |
| Mixed precision | bf16 |
| Gradient checkpointing | Yes |
VRAM Usage (MI300X 192 GB)
| Stage | Approx. Reserved VRAM | Utilization |
|---|---|---|
| After model load | ~7–12 GiB | ~4–6% |
| After LoRA injection | ~8–15 GiB | ~5–8% |
| Peak during training | ~140–175 GiB | ~73–91% |
| After training (inference) | ~40–60 GiB | ~21–31% |
How to Use (minimal example)
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen2.5-3B-Instruct",
torch_dtype=torch.bfloat16,
device_map="auto"
)
model = PeftModel.from_pretrained(base_model, "Phase-Technologies/ReasonBorn-Qwen-3B")
tokenizer = AutoTokenizer.from_pretrained("Phase-Technologies/ReasonBorn-Qwen-3B")
prompt = """<|im_start|>system
You are ReasonBorn. Use <plan>, <reasoning> with <step> & <verify>, <conclusion> strictly.
<|im_end|>
<|im_start|>user
Prove that √2 is irrational.
<|im_end|>
<|im_start|>assistant
"""
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
output = model.generate(**inputs, max_new_tokens=1200, temperature=0.2, top_p=0.92)
print(tokenizer.decode(output[0], skip_special_tokens=True))
Acknowledgments
- Qwen team for the excellent base model
- Xerv-AI for releasing GRAD – one of the cleanest graduate-level math reasoning datasets available in 2026
- Hugging Face for the ecosystem
- AMD ROCm team for making MI300X training possible
Phase-Technologies / ReasonBorn-Qwen-3B
First step toward verifiable, tagged, auditable AI mathematical reasoning.
Trained in Kolkata, March 2026.
- Downloads last month
- 109