Model Summary
We train Llama-3.2-1B-parameter language model fine-tuned to convert natural language arithmetic queries into strictly structured, executable calculator tool calls.
The model generates nested YAML-based calculator invocations wrapped inside <calculator> tags, enabling deterministic execution of multi-step arithmetic expressions.
This model is optimized for reasoning-to-structure, not free-form chain-of-thought or natural language explanations.
| Model | Accuracy Before RL (%) | Accuracy After RL (%) |
|---|---|---|
| Llama 3.2 1B Instruct | 24.55% | 51.63% |
| LFM 2.5 1.5B Instruct | 16.46% | 30.28% |
| Qwen 2.5 1.5B Instruct | 51.43% | 59.49% |
| GLM-edge-1.5B Chat | 23.42% | 34.18% |
| Olmo-2 1B Instruct | 22.78% | 20.25% |
Core Capability
Given a math problem in natural language, the model produces exactly one valid calculator call:
- Wrapped in
<calculator> ... </calculator> - YAML formatted
- Recursively nested
- Using only the operations:
add,subtract,multiply,divide
Example
Input Subtract 50 from 100, divide result by 2, then multiply by 10.
Output
<calculator>
operation: "multiply"
operands:
- operation: "divide"
operands:
- operation: "subtract"
operands:
- 100
- 50
- 2
- 10
</calculator>
Training Dataset
The model was trained using a synthetic curriculum-based dataset from: 🔗 https://github.com/Danau5tin/calculator_agent_rl of 1500 train samples and 158 test samples
Training Methodology
Optimization Paradigm
- Group Relative Policy Optimization (GRPO)
- Reference-policy based RL fine-tuning
- No vLLM usage due to rollout memory constraints
Hardware
- Single NVIDIA A10 (24GB)
Key Hyperparameters
| Parameter | Value |
|---|---|
| Epochs | 1 |
| Learning Rate | 3e-6 (constant, no warmup) |
| Advantage Estimator | GRPO |
| Rollouts per Prompt (G) | 8 |
| Max Prompt Length | 512 |
| Max Completion Length | 128 |
| Batch Size | 8 |
| Gradient Clipping | 1.0 |
| Precision | BF16 |
Use with Transformers
import torch
import re
import yaml
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "AbleCredit/Llama-3.2-1B-Calculator"
# recursive calculation parser
def calculate(node):
if isinstance(node, (int, float)):
return float(node)
op = node.get("operation")
operands = [calculate(arg) for arg in node.get("operands", [])]
if op == "add": return sum(operands)
if op == "subtract": return operands[0] - operands[1]
if op == "multiply": return operands[0] * operands[1]
if op == "divide": return operands[0] / operands[1] if operands[1] != 0 else 0
return 0
# load Model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id, torch_dtype=torch.bfloat16, device_map="auto"
)
system_prompt = """You are an AI agent that converts math problems into a SINGLE, VALID, NESTED calculator tool call.
Don't generate examples, only consume the question and convert them into tool call inside <calculator> tags.
Strictly don't add any other text.
CRITICAL: Your output MUST be wrapped in <calculator> tags.
Calculator Syntax
- Use YAML format inside the tags.
- VALID OPERATIONS: add, subtract, multiply, divide.
4 Golden Rules of Structure
1. NO FLAT LISTS FOR MIXED MATH
2. FIND THE ROOT FIRST
3. "THEN" MEANS WRAP
4. RESPECT PRECEDENCE
Few-Shot Examples
[Include 1–2 examples here]
User:
<YOUR QUESTION HERE>"""
question = "Multiply 15 by 3, add 5, then divide the whole thing by 10."
messages = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": question}
]
# generate
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
with torch.no_grad():
outputs = model.generate(inputs, max_new_tokens=256, do_sample=False)
response = tokenizer.decode(
outputs[0][len(inputs[0]):],
skip_special_tokens=True
)
# execute
print(f"Model Output:\n{response}")
match = re.search(r'<calculator>(.*?)</calculator>', response, re.DOTALL)
if match:
yaml_content = match.group(1).strip()
struct = yaml.safe_load(yaml_content)
result = calculate(struct)
print(f"Executed Result: {result}")
Research Work
This research work was carried out by Abinesh Mathivanan
- Downloads last month
- 26