Model Summary
We release a fine-tuned version of Qwen-0.6B parameter model (thinking variant), optimized for mathematical reasoning and structured calculator tool calling. It was trained using Group Relative Policy Optimization (GRPO) to externalize arithmetic computations into executable YAML-based tool calls.
Model Description
Instead of relying on implicit arithmetic in free-form text, Qwen3-0.6B-Calculator follows a strict two-step process:
- Thinking Phase: Generates internal reasoning within
<thought>tags. - Tool Call Phase: Generates a single, valid, nested YAML expression within
<calculator>tags.
The model is specifically designed to solve GSM8K-style word problems by offloading the final calculation to a deterministic calculator, significantly reducing "hallucination" in multi-step arithmetic.
Performance (GSM8K Tool-Calling Accuracy)
The model was evaluated on the GSM8K test set (1,319 samples). Below is a comparison of tool-calling accuracy before and after Reinforcement Learning (RL) compared to other base models.
| Model | Before RL | After RL (GRPO) | Absolute Improvement |
|---|---|---|---|
| Llama 3.2-1B Instruct | 4.46% | 14.56% | +10.10% |
| Qwen 2.5-1.5B Instruct | 15.77% | 23.50% | +7.73% |
| Qwen3-0.6B (Thinking) | ~0.00% | 49.50% | +49.50% |
We observed a significant decrease in <think> tokens often denoting stable reasoning process.
Training Procedure
The model was trained on a Single NVIDIA 10 (24GB)
| Parameter | Value |
|---|---|
| Method | GRPO |
| Dataset | GSM8K (7,470 training samples) |
| Learning Rate | 1e-5 (Cosine scheduler, 0.1 warmup) |
| Rollouts (G) | 4 generations per prompt |
| Batch Size | 4 |
| Max Output Length | 512 |
| Precision | BF16 |
| Sampling Temperature | 0.6 |
Use with Transformers
To use this model, you must implement the parsing logic to extract and execute the YAML calculator calls. We recommend using transformers>4.51.0 to avoid 'qwen3' keyword error.
import torch
import re
import yaml
import math
from transformers import AutoModelForCausalLM, AutoTokenizer
MODEL_PATH = "AbleCredit/Qwen3-0.6B-Calculator"
SYSTEM_PROMPT = """You are a mathematical reasoning agent.
1. Break down the problem into logical steps inside <thought> tags.
2. Convert the final expression into a SINGLE, VALID, NESTED calculator tool call inside <calculator> tags using YAML.
Operations: add, subtract, multiply, divide.
Example:
<thought>Natalia sold 48 clips in April. In May she sold half: 48/2=24. Total: 48+24=72.</thought>
<calculator>
operation: "add"
operands:
- 48
- operation: "divide"
operands: [48, 2]
</calculator>"""
# calculator
def clean_yaml_load(text):
text = re.sub(r'#.*', '', text)
return yaml.safe_load(text)
def calculate_recursive(data):
if isinstance(data, (int, float)): return float(data)
if not isinstance(data, dict):
try: return float(str(data))
except: return 0.0
op = data.get('operation', '').lower()
operands = data.get('operands', [])
if not operands: return 0.0
vals = [calculate_recursive(o) for o in operands]
if op == 'add': return sum(vals)
if op == 'subtract': return vals[0] - (vals[1] if len(vals) > 1 else 0)
if op == 'multiply':
res = 1
for x in vals: res *= x
return res
if op == 'divide': return vals[0] / vals[1] if (len(vals) > 1 and vals[1] != 0) else 0
return 0.0
def get_calculator_result(content_text):
try:
match = re.search(r'<calculator>(.*?)</calculator>', content_text, re.DOTALL)
if not match: return None
data = clean_yaml_load(match.group(1).strip())
return calculate_recursive(data)
except:
return None
# inference
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, torch_dtype="auto", device_map="auto")
question = "Janet has 30 apples. She gives 5 to her sister and 3 to her brother. Then she buys twice as many as she has left. How many apples does she have now?"
messages = [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": question}
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
generated_ids = model.generate(**model_inputs, max_new_tokens=512, temperature=0.6)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):]
response = tokenizer.decode(output_ids, skip_special_tokens=True)
# execute
predicted_val = get_calculator_result(response)
print(f"Model Response:\n{response}")
print(f"Final Calculated Answer: {predicted_val}")
Research Work
This research work was carried out by Abinesh Mathivanan
- Downloads last month
- 7