AbleCredit
/

Qwen3-0.6B-Calculator

+---
+license: apache-2.0
+language:
+- en
+metrics:
+- bertscore
+pipeline_tag: text-generation
+library_name: transformers
+tags:
+- math
+- calculator
+- grpo
+---
+## Model Summary
+We release a fine-tuned version of Qwen-0.6B parameter model (thinking variant), optimized for mathematical reasoning and structured calculator tool calling.
+It was trained using **Group Relative Policy Optimization (GRPO)** to externalize arithmetic computations into executable YAML-based tool calls.
+---
+## Model Description
+Instead of relying on implicit arithmetic in free-form text, **Qwen3-0.6B-Calculator** follows a strict two-step process:
+1.  **Thinking Phase**: Generates internal reasoning within `<thought>` tags.
+2.  **Tool Call Phase**: Generates a single, valid, nested YAML expression within `<calculator>` tags.
+The model is specifically designed to solve GSM8K-style word problems by offloading the final calculation to a deterministic calculator, significantly reducing "hallucination" in multi-step arithmetic.
+---
+## Performance (GSM8K Tool-Calling Accuracy)
+The model was evaluated on the GSM8K test set (1,319 samples). Below is a comparison of tool-calling accuracy before and after Reinforcement Learning (RL) compared to other base models.
+| Model | Before RL | After RL (GRPO) | Absolute Improvement |
+| :--- | :---: | :---: | :---: |
+| Llama 3.2-1B Instruct | 4.46% | 14.56% | +10.10% |
+| Qwen 2.5-1.5B Instruct | 15.77% | 23.50% | +7.73% |
+| **Qwen3-0.6B (Thinking)** | ~0.00% | **49.50%** | **+49.50%** |
+We observed a significant decrease in `<think>` tokens often denoting stable reasoning process.
+---
+## Training Procedure
+The model was trained on a Single NVIDIA 10 (24GB)
+| Parameter                | Value                                   |
+|--------------------------|-----------------------------------------|
+| Method                   | GRPO                                    |
+| Dataset                  | GSM8K (7,470 training samples)           |
+| Learning Rate            | 1e-5 (Cosine scheduler, 0.1 warmup)      |
+| Rollouts (G)             | 4 generations per prompt                |
+| Batch Size               | 4                                       |
+| Max Output Length        | 512                                     |
+| Precision                | BF16                                    |
+| Sampling Temperature     | 0.6                                     |
+---
+## Use with Transformers
+To use this model, you must implement the parsing logic to extract and execute the YAML calculator calls. We recommend using `transformers>4.51.0` to avoid 'qwen3' keyword error.
+```python
+import torch
+import re
+import yaml
+import math
+from transformers import AutoModelForCausalLM, AutoTokenizer
+MODEL_PATH = "AbleCredit/Qwen3-0.6B-Calculator"
+SYSTEM_PROMPT = """You are a mathematical reasoning agent.
+1. Break down the problem into logical steps inside <thought> tags.
+2. Convert the final expression into a SINGLE, VALID, NESTED calculator tool call inside <calculator> tags using YAML.
+Operations: add, subtract, multiply, divide.
+Example:
+<thought>Natalia sold 48 clips in April. In May she sold half: 48/2=24. Total: 48+24=72.</thought>
+<calculator>
+operation: "add"
+operands:
+  - 48
+  - operation: "divide"
+    operands: [48, 2]
+</calculator>"""
+# calculator
+def clean_yaml_load(text):
+    text = re.sub(r'#.*', '', text)
+    return yaml.safe_load(text)
+def calculate_recursive(data):
+    if isinstance(data, (int, float)): return float(data)
+    if not isinstance(data, dict):
+        try: return float(str(data))
+        except: return 0.0
+    op = data.get('operation', '').lower()
+    operands = data.get('operands', [])
+    if not operands: return 0.0
+    vals = [calculate_recursive(o) for o in operands]
+    if op == 'add': return sum(vals)
+    if op == 'subtract': return vals[0] - (vals[1] if len(vals) > 1 else 0)
+    if op == 'multiply':
+        res = 1
+        for x in vals: res *= x
+        return res
+    if op == 'divide': return vals[0] / vals[1] if (len(vals) > 1 and vals[1] != 0) else 0
+    return 0.0
+def get_calculator_result(content_text):
+    try:
+        match = re.search(r'<calculator>(.*?)</calculator>', content_text, re.DOTALL)
+        if not match: return None
+        data = clean_yaml_load(match.group(1).strip())
+        return calculate_recursive(data)
+    except:
+        return None
+# inference
+tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
+model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, torch_dtype="auto", device_map="auto")
+question = "Janet has 30 apples. She gives 5 to her sister and 3 to her brother. Then she buys twice as many as she has left. How many apples does she have now?"
+messages = [
+    {"role": "system", "content": SYSTEM_PROMPT},
+    {"role": "user", "content": question}
+]
+text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
+model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
+generated_ids = model.generate(**model_inputs, max_new_tokens=512, temperature=0.6)
+output_ids = generated_ids[0][len(model_inputs.input_ids[0]):]
+response = tokenizer.decode(output_ids, skip_special_tokens=True)
+# execute
+predicted_val = get_calculator_result(response)
+print(f"Model Response:\n{response}")
+print(f"Final Calculated Answer: {predicted_val}")
+```
+---
+## Research Work
+This research work was carried out by [Abinesh Mathivanan](https://www.linkedin.com/in/abineshmathivanan/)