Beens commited on
Commit
7511080
·
verified ·
1 Parent(s): 11fffe5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +150 -3
README.md CHANGED
@@ -1,3 +1,150 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ metrics:
6
+ - bertscore
7
+ pipeline_tag: text-generation
8
+ library_name: transformers
9
+ tags:
10
+ - math
11
+ - calculator
12
+ - grpo
13
+ ---
14
+
15
+ ## Model Summary
16
+
17
+ We release a fine-tuned version of Qwen-0.6B parameter model (thinking variant), optimized for mathematical reasoning and structured calculator tool calling.
18
+ It was trained using **Group Relative Policy Optimization (GRPO)** to externalize arithmetic computations into executable YAML-based tool calls.
19
+
20
+ ---
21
+ ## Model Description
22
+
23
+ Instead of relying on implicit arithmetic in free-form text, **Qwen3-0.6B-Calculator** follows a strict two-step process:
24
+ 1. **Thinking Phase**: Generates internal reasoning within `<thought>` tags.
25
+ 2. **Tool Call Phase**: Generates a single, valid, nested YAML expression within `<calculator>` tags.
26
+
27
+ The model is specifically designed to solve GSM8K-style word problems by offloading the final calculation to a deterministic calculator, significantly reducing "hallucination" in multi-step arithmetic.
28
+
29
+ ---
30
+ ## Performance (GSM8K Tool-Calling Accuracy)
31
+
32
+ The model was evaluated on the GSM8K test set (1,319 samples). Below is a comparison of tool-calling accuracy before and after Reinforcement Learning (RL) compared to other base models.
33
+
34
+ | Model | Before RL | After RL (GRPO) | Absolute Improvement |
35
+ | :--- | :---: | :---: | :---: |
36
+ | Llama 3.2-1B Instruct | 4.46% | 14.56% | +10.10% |
37
+ | Qwen 2.5-1.5B Instruct | 15.77% | 23.50% | +7.73% |
38
+ | **Qwen3-0.6B (Thinking)** | ~0.00% | **49.50%** | **+49.50%** |
39
+
40
+ We observed a significant decrease in `<think>` tokens often denoting stable reasoning process.
41
+
42
+ ---
43
+ ## Training Procedure
44
+
45
+ The model was trained on a Single NVIDIA 10 (24GB)
46
+
47
+ | Parameter | Value |
48
+ |--------------------------|-----------------------------------------|
49
+ | Method | GRPO |
50
+ | Dataset | GSM8K (7,470 training samples) |
51
+ | Learning Rate | 1e-5 (Cosine scheduler, 0.1 warmup) |
52
+ | Rollouts (G) | 4 generations per prompt |
53
+ | Batch Size | 4 |
54
+ | Max Output Length | 512 |
55
+ | Precision | BF16 |
56
+ | Sampling Temperature | 0.6 |
57
+
58
+ ---
59
+ ## Use with Transformers
60
+
61
+ To use this model, you must implement the parsing logic to extract and execute the YAML calculator calls. We recommend using `transformers>4.51.0` to avoid 'qwen3' keyword error.
62
+
63
+ ```python
64
+ import torch
65
+ import re
66
+ import yaml
67
+ import math
68
+ from transformers import AutoModelForCausalLM, AutoTokenizer
69
+
70
+ MODEL_PATH = "AbleCredit/Qwen3-0.6B-Calculator"
71
+
72
+ SYSTEM_PROMPT = """You are a mathematical reasoning agent.
73
+ 1. Break down the problem into logical steps inside <thought> tags.
74
+ 2. Convert the final expression into a SINGLE, VALID, NESTED calculator tool call inside <calculator> tags using YAML.
75
+
76
+ Operations: add, subtract, multiply, divide.
77
+ Example:
78
+ <thought>Natalia sold 48 clips in April. In May she sold half: 48/2=24. Total: 48+24=72.</thought>
79
+ <calculator>
80
+ operation: "add"
81
+ operands:
82
+ - 48
83
+ - operation: "divide"
84
+ operands: [48, 2]
85
+ </calculator>"""
86
+
87
+ # calculator
88
+ def clean_yaml_load(text):
89
+ text = re.sub(r'#.*', '', text)
90
+ return yaml.safe_load(text)
91
+
92
+ def calculate_recursive(data):
93
+ if isinstance(data, (int, float)): return float(data)
94
+ if not isinstance(data, dict):
95
+ try: return float(str(data))
96
+ except: return 0.0
97
+
98
+ op = data.get('operation', '').lower()
99
+ operands = data.get('operands', [])
100
+ if not operands: return 0.0
101
+
102
+ vals = [calculate_recursive(o) for o in operands]
103
+
104
+ if op == 'add': return sum(vals)
105
+ if op == 'subtract': return vals[0] - (vals[1] if len(vals) > 1 else 0)
106
+ if op == 'multiply':
107
+ res = 1
108
+ for x in vals: res *= x
109
+ return res
110
+ if op == 'divide': return vals[0] / vals[1] if (len(vals) > 1 and vals[1] != 0) else 0
111
+ return 0.0
112
+
113
+ def get_calculator_result(content_text):
114
+ try:
115
+ match = re.search(r'<calculator>(.*?)</calculator>', content_text, re.DOTALL)
116
+ if not match: return None
117
+ data = clean_yaml_load(match.group(1).strip())
118
+ return calculate_recursive(data)
119
+ except:
120
+ return None
121
+
122
+ # inference
123
+ tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH)
124
+ model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, torch_dtype="auto", device_map="auto")
125
+
126
+ question = "Janet has 30 apples. She gives 5 to her sister and 3 to her brother. Then she buys twice as many as she has left. How many apples does she have now?"
127
+
128
+ messages = [
129
+ {"role": "system", "content": SYSTEM_PROMPT},
130
+ {"role": "user", "content": question}
131
+ ]
132
+
133
+ text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
134
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
135
+
136
+ generated_ids = model.generate(**model_inputs, max_new_tokens=512, temperature=0.6)
137
+ output_ids = generated_ids[0][len(model_inputs.input_ids[0]):]
138
+ response = tokenizer.decode(output_ids, skip_special_tokens=True)
139
+
140
+ # execute
141
+ predicted_val = get_calculator_result(response)
142
+
143
+ print(f"Model Response:\n{response}")
144
+ print(f"Final Calculated Answer: {predicted_val}")
145
+ ```
146
+
147
+ ---
148
+ ## Research Work
149
+
150
+ This research work was carried out by [Abinesh Mathivanan](https://www.linkedin.com/in/abineshmathivanan/)