Instructions to use AbleCredit/Llama-3.2-1B-Calculator with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AbleCredit/Llama-3.2-1B-Calculator with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="AbleCredit/Llama-3.2-1B-Calculator")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AbleCredit/Llama-3.2-1B-Calculator")
model = AutoModelForCausalLM.from_pretrained("AbleCredit/Llama-3.2-1B-Calculator")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use AbleCredit/Llama-3.2-1B-Calculator with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AbleCredit/Llama-3.2-1B-Calculator"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AbleCredit/Llama-3.2-1B-Calculator",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AbleCredit/Llama-3.2-1B-Calculator

SGLang

How to use AbleCredit/Llama-3.2-1B-Calculator with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "AbleCredit/Llama-3.2-1B-Calculator" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AbleCredit/Llama-3.2-1B-Calculator",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "AbleCredit/Llama-3.2-1B-Calculator" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AbleCredit/Llama-3.2-1B-Calculator",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use AbleCredit/Llama-3.2-1B-Calculator with Docker Model Runner:
```
docker model run hf.co/AbleCredit/Llama-3.2-1B-Calculator
```
Browse Quantizations to use this model in llama.cpp, Ollama, LM Studio, or any compatible app.

Model Summary

We train Llama-3.2-1B-parameter language model fine-tuned to convert natural language arithmetic queries into strictly structured, executable calculator tool calls.
The model generates nested YAML-based calculator invocations wrapped inside <calculator> tags, enabling deterministic execution of multi-step arithmetic expressions.

This model is optimized for reasoning-to-structure, not free-form chain-of-thought or natural language explanations.

Model	Accuracy Before RL (%)	Accuracy After RL (%)
Llama 3.2 1B Instruct	24.55%	51.63%
LFM 2.5 1.5B Instruct	16.46%	30.28%
Qwen 2.5 1.5B Instruct	51.43%	59.49%
GLM-edge-1.5B Chat	23.42%	34.18%
Olmo-2 1B Instruct	22.78%	20.25%

Core Capability

Given a math problem in natural language, the model produces exactly one valid calculator call:

Wrapped in <calculator> ... </calculator>
YAML formatted
Recursively nested
Using only the operations: add, subtract, multiply, divide

Example

Input Subtract 50 from 100, divide result by 2, then multiply by 10.

Output

  <calculator>
  operation: "multiply"
  operands:
    - operation: "divide"
      operands:
        - operation: "subtract"
          operands:
            - 100
            - 50
        - 2
    - 10
  </calculator>

Training Dataset

The model was trained using a synthetic curriculum-based dataset from: 🔗 https://github.com/Danau5tin/calculator_agent_rl of 1500 train samples and 158 test samples

Training Methodology

Optimization Paradigm

Group Relative Policy Optimization (GRPO)
Reference-policy based RL fine-tuning
No vLLM usage due to rollout memory constraints

Hardware

Single NVIDIA A10 (24GB)

Key Hyperparameters

Parameter	Value
Epochs	1
Learning Rate	3e-6 (constant, no warmup)
Advantage Estimator	GRPO
Rollouts per Prompt (G)	8
Max Prompt Length	512
Max Completion Length	128
Batch Size	8
Gradient Clipping	1.0
Precision	BF16

Use with Transformers

import torch
import re
import yaml
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "AbleCredit/Llama-3.2-1B-Calculator"

# recursive calculation parser 
def calculate(node):
    if isinstance(node, (int, float)):
        return float(node)
    
    op = node.get("operation")
    operands = [calculate(arg) for arg in node.get("operands", [])]
    
    if op == "add": return sum(operands)
    if op == "subtract": return operands[0] - operands[1]
    if op == "multiply": return operands[0] * operands[1]
    if op == "divide": return operands[0] / operands[1] if operands[1] != 0 else 0
    return 0

# load Model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id, torch_dtype=torch.bfloat16, device_map="auto"
)

system_prompt = """You are an AI agent that converts math problems into a SINGLE, VALID, NESTED calculator tool call.

Don't generate examples, only consume the question and convert them into tool call inside <calculator> tags.
Strictly don't add any other text.

CRITICAL: Your output MUST be wrapped in <calculator> tags.

Calculator Syntax
- Use YAML format inside the tags.
- VALID OPERATIONS: add, subtract, multiply, divide.

4 Golden Rules of Structure
1. NO FLAT LISTS FOR MIXED MATH
2. FIND THE ROOT FIRST
3. "THEN" MEANS WRAP
4. RESPECT PRECEDENCE

Few-Shot Examples
[Include 1–2 examples here]

User:
<YOUR QUESTION HERE>"""

question = "Multiply 15 by 3, add 5, then divide the whole thing by 10."
messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": question}
]

# generate
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    return_tensors="pt"
).to(model.device)

with torch.no_grad():
    outputs = model.generate(inputs, max_new_tokens=256, do_sample=False)

response = tokenizer.decode(
    outputs[0][len(inputs[0]):],
    skip_special_tokens=True
)


# execute
print(f"Model Output:\n{response}")
match = re.search(r'<calculator>(.*?)</calculator>', response, re.DOTALL)

if match:
    yaml_content = match.group(1).strip()
    struct = yaml.safe_load(yaml_content)
    result = calculate(struct)
    print(f"Executed Result: {result}")