File size: 9,950 Bytes
d5d55c9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
# Model Card for Kirim-1-Math

## Model Details

### Model Description

**Kirim-1-Math** is a 30-billion parameter large language model specialized for advanced mathematical reasoning and problem-solving. It is the first model in the Kirim series to feature built-in tool calling capabilities, allowing it to execute mathematical computations, symbolic manipulations, and code for numerical solutions.

- **Developed by:** Kirim AI Team
- **Model type:** Causal Language Model (Decoder-only Transformer)
- **Language(s):** Chinese, English
- **License:** Apache 2.0
- **Base Model:** Kirim-V1-base (expanded from 13B to 30B)
- **Specialization:** Mathematical reasoning, theorem proving, symbolic computation

### Model Capabilities

- **Mathematical Reasoning**: Solve problems from elementary to olympiad level
- **Tool Calling**: Execute calculator, symbolic solver, derivative, integration, and code execution
- **Step-by-Step Solutions**: Show detailed work for problem-solving
- **LaTeX Output**: Format mathematical expressions properly
- **Bilingual**: Handle problems in both Chinese and English
- **Code Generation**: Write and execute Python/SymPy code for numerical solutions

## Model Sources

- **Repository:** [github.com/Kirim-ai/Kirim-1-Math](https://github.com/Kirim-ai/Kirim-1-Math)
- **Paper:** [Kirim-1-Math: Advanced Mathematical Reasoning with Tool Calling](https://huggingface.co/papers)
- **Demo:** [huggingface.co/spaces/Kirim-ai/Kirim-1-Math-demo](https://huggingface.co/spaces/Kirim-ai/Kirim-1-Math-demo)
- **Base Model:** [Kirim-ai/Kirim-V1-base](https://huggingface.co/Kirim-ai/Kirim-V1-base)

## Uses

### Direct Use

The model can be used directly for:

- **Educational Tutoring**: Explain mathematical concepts with step-by-step reasoning
- **Homework Assistance**: Solve problems across all difficulty levels
- **Competition Preparation**: Practice for AMC, AIME, IMO, Putnam
- **Research Assistance**: Verify proofs and perform symbolic computations
- **Code-Assisted Problem Solving**: Use numerical methods for complex calculations

### Downstream Use

Fine-tuning possibilities:

- Domain-specific mathematical applications (physics, engineering, finance)
- Custom tool integration for specialized computations
- Educational platforms with adaptive difficulty
- Mathematical theorem proving systems

### Out-of-Scope Use

The model should NOT be used for:

- **Academic dishonesty**: Cheating on exams or assignments
- **Safety-critical systems**: Without human verification (e.g., structural engineering calculations)
- **Financial advice**: Trading or investment decisions without expert review
- **Medical calculations**: Drug dosages or medical equipment calibration
- **Legal matters**: Without professional mathematician/lawyer verification

## Bias, Risks, and Limitations

### Known Limitations

**Technical Limitations:**
- Cannot process visual mathematics (diagrams, geometric figures)
- May struggle with extremely novel mathematical concepts
- Limited to training data through October 2024
- Tool execution can fail for edge cases
- Performance degrades on extremely complex graduate-level problems

**Reasoning Limitations:**
- May make logical errors in complex proofs
- Can hallucinate intermediate steps
- Occasionally produces incorrect final answers
- May not recognize when a problem has no solution

**Computational Limitations:**
- Cannot perform arbitrarily large calculations without tools
- Numerical precision limited by underlying libraries
- May timeout on very long computations

### Risks and Biases

**Potential Risks:**
- Students may become over-reliant on AI assistance
- Could generate plausible but incorrect mathematical reasoning
- May perpetuate biases in mathematical education approaches
- Tool execution could consume excessive computational resources

**Mitigation Strategies:**
- Always verify critical results with human experts
- Use temperature=0.1 for deterministic mathematical reasoning
- Enable tool calling for numerical verification
- Cross-check answers with multiple methods
- Implement appropriate safeguards in educational settings

## How to Get Started

### Installation

```bash
pip install torch transformers accelerate sympy
```

### Basic Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model
model = AutoModelForCausalLM.from_pretrained(
    "Kirim-ai/Kirim-1-Math",
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(
    "Kirim-ai/Kirim-1-Math",
    trust_remote_code=True
)

# Solve a problem
messages = [
    {"role": "user", "content": "Solve: x² - 5x + 6 = 0"}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=2048, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### Using the Inference Script

```bash
# Interactive mode
python inference_math.py --interactive

# Single problem
python inference_math.py --problem "Calculate the derivative of x^3 + 2x^2"

# With quantization
python inference_math.py --load_in_4bit --interactive
```

## Training Details

### Training Data

**Mathematical Corpus (500B tokens):**
- Mathematical proofs: ProofWiki, Lean, Coq, Isabelle (125B tokens)
- Olympiad problems: IMO, USAMO, AMC, AIME, Putnam (150B tokens)
- arXiv papers: math.AC, math.AG, math.NT, math.CO (100B tokens)
- Textbooks: undergraduate to graduate level (75B tokens)
- Q&A: Math StackExchange, MathOverflow (50B tokens)

**Code Corpus (200B tokens):**
- Mathematical Python libraries (NumPy, SymPy, SciPy)
- Computational notebooks from Kaggle, GitHub
- Algorithm implementations

**General Corpus (800B tokens):**
- From Kirim-V1-base pre-training

**Total: 1.5 Trillion tokens**

### Training Procedure

#### Stage 1: Model Expansion (15 days)
- Expanded from 13B to 30B parameters
- Progressive width and depth scaling
- Hidden size: 4096 → 5120
- Layers: 32 → 48

#### Stage 2: Mathematical Pre-training (30 days)
- 500B tokens of mathematical content
- Hardware: 512x NVIDIA H100 80GB
- Batch size: 2048
- Learning rate: 1.5e-4 with cosine decay
- Optimization: AdamW, BF16 precision

#### Stage 3: Instruction Tuning (5 days)
- 200K mathematical instruction-response pairs
- Balanced across algebra, calculus, geometry, etc.
- Learning rate: 2e-5
- 3 epochs

#### Stage 4: Tool Calling Training (3 days)
- 50K tool-calling examples
- Function definition and execution
- Error handling and recovery

#### Stage 5: Reinforcement Learning (7 days)
- PPO-based training
- Reward based on solution correctness
- Symbolic and numerical verification

#### Training Hyperparameters

- **Optimizer:** AdamW
- **Learning rate:** 1.5e-4 (pre-training), 2e-5 (fine-tuning)
- **Weight decay:** 0.1
- **Warmup steps:** 2000
- **Gradient clipping:** 1.0
- **Precision:** BFloat16
- **Total GPU hours:** 30,720
- **Estimated cost:** $450,000 USD

### Compute Infrastructure

- **Pre-training:** 512x NVIDIA H100 80GB GPUs
- **Fine-tuning:** 128x NVIDIA H100 80GB GPUs
- **Framework:** PyTorch 2.1, DeepSpeed ZeRO-3
- **Parallelism:** Tensor (8-way), Pipeline (4-way), Data (16-way)

## Evaluation

### Mathematical Reasoning

| Benchmark | Score | Comparison |
|-----------|-------|------------|
| GSM8K | 94.2% | GPT-4: 92.0% |
| MATH | 78.5% | GPT-4: 76.4% |
| MMLU-Math | 88.7% | GPT-4: 86.9% |
| AMC10/12 | 72.3% | Human avg: 45% |
| AIME | 38.7% | Human qualifier: 40% |

### Tool Calling

| Metric | Score |
|--------|-------|
| Tool Selection | 96.8% |
| Parameter Extraction | 94.2% |
| Execution Success | 92.5% |
| Result Integration | 95.1% |

### Code Generation

| Task | Pass@1 | Pass@10 |
|------|--------|---------|
| HumanEval-Math | 78.3% | 92.1% |
| SymPy Tasks | 82.5% | 94.7% |
| NumPy Tasks | 75.6% | 89.3% |

### Performance

- **Inference Speed:** 45 tokens/second (A100 80GB)
- **Memory:** 60GB (BF16), 30GB (INT8), 20GB (INT4)
- **Latency:** 89ms mean, 145ms p95

## Environmental Impact

- **Hardware:** NVIDIA H100 GPUs
- **Training Time:** 60 days (30,720 GPU hours)
- **Estimated CO₂:** ~8,500 kg CO₂eq
- **Power Consumption:** ~850 MWh

We are committed to reducing environmental impact through efficient training and model optimization.

## Technical Specifications

### Model Architecture

| Parameter | Value |
|-----------|-------|
| Parameters | 30B |
| Hidden Size | 5,120 |
| Layers | 48 |
| Attention Heads | 40 |
| KV Heads | 8 (GQA) |
| Intermediate Size | 13,824 |
| Vocabulary | 102,400 |
| Context Length | 32,768 |
| Position Encoding | RoPE with YaRN |
| Activation | SiLU |
| Normalization | RMSNorm |

### Special Features

- **Tool Calling:** JSON-based function calling
- **Symbolic Solver:** SymPy integration
- **Code Execution:** Sandboxed Python runtime
- **LaTeX Formatting:** Automatic equation formatting

## Citation

```bibtex
@misc{kirim2025math,
  title={Kirim-1-Math: Advanced Mathematical Reasoning with Tool Calling},
  author={Qiling Research},
  year={2025},
  publisher={Kirim AI},
  url={https://huggingface.co/Kirim-ai/Kirim-1-Math}
}
```

## Model Card Authors

Qiling Research

## Ethical Considerations

### Educational Impact

- May affect traditional mathematics education
- Could reduce development of mental math skills
- Should be used as a learning aid, not replacement

### Accessibility

- Makes advanced mathematics more accessible
- Could democratize STEM education
- May widen gap if access is unequal

### Verification

- Always verify results for critical applications
- Use multiple methods for important calculations
- Maintain human oversight in education

## Glossary

- **Tool Calling:** Ability to invoke external functions for computation
- **Symbolic Solver:** Algebraic manipulation system (SymPy)
- **GQA:** Grouped Query Attention for efficiency
- **RoPE:** Rotary Position Embedding
- **YaRN:** Yet another RoPE extension method