Model Summary

We train Llama-3.2-1B-parameter language model fine-tuned to convert natural language arithmetic queries into strictly structured, executable calculator tool calls.
The model generates nested YAML-based calculator invocations wrapped inside <calculator> tags, enabling deterministic execution of multi-step arithmetic expressions.

This model is optimized for reasoning-to-structure, not free-form chain-of-thought or natural language explanations.

Model Accuracy Before RL (%) Accuracy After RL (%)
Llama 3.2 1B Instruct 24.55% 51.63%
LFM 2.5 1.5B Instruct 16.46% 30.28%
Qwen 2.5 1.5B Instruct 51.43% 59.49%
GLM-edge-1.5B Chat 23.42% 34.18%
Olmo-2 1B Instruct 22.78% 20.25%

Core Capability

Given a math problem in natural language, the model produces exactly one valid calculator call:

  • Wrapped in <calculator> ... </calculator>
  • YAML formatted
  • Recursively nested
  • Using only the operations: add, subtract, multiply, divide

Example

Input Subtract 50 from 100, divide result by 2, then multiply by 10.

Output

  <calculator>
  operation: "multiply"
  operands:
    - operation: "divide"
      operands:
        - operation: "subtract"
          operands:
            - 100
            - 50
        - 2
    - 10
  </calculator>

Training Dataset

The model was trained using a synthetic curriculum-based dataset from: 🔗 https://github.com/Danau5tin/calculator_agent_rl of 1500 train samples and 158 test samples


Training Methodology

Optimization Paradigm

  • Group Relative Policy Optimization (GRPO)
  • Reference-policy based RL fine-tuning
  • No vLLM usage due to rollout memory constraints

Hardware

  • Single NVIDIA A10 (24GB)

Key Hyperparameters

Parameter Value
Epochs 1
Learning Rate 3e-6 (constant, no warmup)
Advantage Estimator GRPO
Rollouts per Prompt (G) 8
Max Prompt Length 512
Max Completion Length 128
Batch Size 8
Gradient Clipping 1.0
Precision BF16

Use with Transformers

import torch
import re
import yaml
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "AbleCredit/Llama-3.2-1B-Calculator"

# recursive calculation parser 
def calculate(node):
    if isinstance(node, (int, float)):
        return float(node)
    
    op = node.get("operation")
    operands = [calculate(arg) for arg in node.get("operands", [])]
    
    if op == "add": return sum(operands)
    if op == "subtract": return operands[0] - operands[1]
    if op == "multiply": return operands[0] * operands[1]
    if op == "divide": return operands[0] / operands[1] if operands[1] != 0 else 0
    return 0

# load Model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)

system_prompt = """You are an AI agent that converts math problems into a SINGLE, VALID, NESTED calculator tool call.

Don't generate examples, only consume the question and convert them into tool call inside <calculator> tags.
Strictly don't add any other text.

CRITICAL: Your output MUST be wrapped in <calculator> tags.

Calculator Syntax
- Use YAML format inside the tags.
- VALID OPERATIONS: add, subtract, multiply, divide.

4 Golden Rules of Structure
1. NO FLAT LISTS FOR MIXED MATH
2. FIND THE ROOT FIRST
3. "THEN" MEANS WRAP
4. RESPECT PRECEDENCE

Few-Shot Examples
[Include 1–2 examples here]

User:
<YOUR QUESTION HERE>"""

question = "Multiply 15 by 3, add 5, then divide the whole thing by 10."
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": question}
]

# generate
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():
    outputs = model.generate(inputs, max_new_tokens=256, do_sample=False)

response = tokenizer.decode(
    outputs[0][len(inputs[0]):],
    skip_special_tokens=True
)


# execute
print(f"Model Output:\n{response}")
match = re.search(r'<calculator>(.*?)</calculator>', response, re.DOTALL)

if match:
    yaml_content = match.group(1).strip()
    struct = yaml.safe_load(yaml_content)
    result = calculate(struct)
    print(f"Executed Result: {result}")

Research Work

This research work was carried out by Abinesh Mathivanan

Downloads last month
26
Safetensors
Model size
1B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for AbleCredit/Llama-3.2-1B-Calculator

Finetuned
(1400)
this model
Quantizations
1 model

Collection including AbleCredit/Llama-3.2-1B-Calculator