File size: 9,950 Bytes
d5d55c9 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 |
# Model Card for Kirim-1-Math
## Model Details
### Model Description
**Kirim-1-Math** is a 30-billion parameter large language model specialized for advanced mathematical reasoning and problem-solving. It is the first model in the Kirim series to feature built-in tool calling capabilities, allowing it to execute mathematical computations, symbolic manipulations, and code for numerical solutions.
- **Developed by:** Kirim AI Team
- **Model type:** Causal Language Model (Decoder-only Transformer)
- **Language(s):** Chinese, English
- **License:** Apache 2.0
- **Base Model:** Kirim-V1-base (expanded from 13B to 30B)
- **Specialization:** Mathematical reasoning, theorem proving, symbolic computation
### Model Capabilities
- **Mathematical Reasoning**: Solve problems from elementary to olympiad level
- **Tool Calling**: Execute calculator, symbolic solver, derivative, integration, and code execution
- **Step-by-Step Solutions**: Show detailed work for problem-solving
- **LaTeX Output**: Format mathematical expressions properly
- **Bilingual**: Handle problems in both Chinese and English
- **Code Generation**: Write and execute Python/SymPy code for numerical solutions
## Model Sources
- **Repository:** [github.com/Kirim-ai/Kirim-1-Math](https://github.com/Kirim-ai/Kirim-1-Math)
- **Paper:** [Kirim-1-Math: Advanced Mathematical Reasoning with Tool Calling](https://huggingface.co/papers)
- **Demo:** [huggingface.co/spaces/Kirim-ai/Kirim-1-Math-demo](https://huggingface.co/spaces/Kirim-ai/Kirim-1-Math-demo)
- **Base Model:** [Kirim-ai/Kirim-V1-base](https://huggingface.co/Kirim-ai/Kirim-V1-base)
## Uses
### Direct Use
The model can be used directly for:
- **Educational Tutoring**: Explain mathematical concepts with step-by-step reasoning
- **Homework Assistance**: Solve problems across all difficulty levels
- **Competition Preparation**: Practice for AMC, AIME, IMO, Putnam
- **Research Assistance**: Verify proofs and perform symbolic computations
- **Code-Assisted Problem Solving**: Use numerical methods for complex calculations
### Downstream Use
Fine-tuning possibilities:
- Domain-specific mathematical applications (physics, engineering, finance)
- Custom tool integration for specialized computations
- Educational platforms with adaptive difficulty
- Mathematical theorem proving systems
### Out-of-Scope Use
The model should NOT be used for:
- **Academic dishonesty**: Cheating on exams or assignments
- **Safety-critical systems**: Without human verification (e.g., structural engineering calculations)
- **Financial advice**: Trading or investment decisions without expert review
- **Medical calculations**: Drug dosages or medical equipment calibration
- **Legal matters**: Without professional mathematician/lawyer verification
## Bias, Risks, and Limitations
### Known Limitations
**Technical Limitations:**
- Cannot process visual mathematics (diagrams, geometric figures)
- May struggle with extremely novel mathematical concepts
- Limited to training data through October 2024
- Tool execution can fail for edge cases
- Performance degrades on extremely complex graduate-level problems
**Reasoning Limitations:**
- May make logical errors in complex proofs
- Can hallucinate intermediate steps
- Occasionally produces incorrect final answers
- May not recognize when a problem has no solution
**Computational Limitations:**
- Cannot perform arbitrarily large calculations without tools
- Numerical precision limited by underlying libraries
- May timeout on very long computations
### Risks and Biases
**Potential Risks:**
- Students may become over-reliant on AI assistance
- Could generate plausible but incorrect mathematical reasoning
- May perpetuate biases in mathematical education approaches
- Tool execution could consume excessive computational resources
**Mitigation Strategies:**
- Always verify critical results with human experts
- Use temperature=0.1 for deterministic mathematical reasoning
- Enable tool calling for numerical verification
- Cross-check answers with multiple methods
- Implement appropriate safeguards in educational settings
## How to Get Started
### Installation
```bash
pip install torch transformers accelerate sympy
```
### Basic Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model
model = AutoModelForCausalLM.from_pretrained(
"Kirim-ai/Kirim-1-Math",
torch_dtype="auto",
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
"Kirim-ai/Kirim-1-Math",
trust_remote_code=True
)
# Solve a problem
messages = [
{"role": "user", "content": "Solve: x² - 5x + 6 = 0"}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=2048, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Using the Inference Script
```bash
# Interactive mode
python inference_math.py --interactive
# Single problem
python inference_math.py --problem "Calculate the derivative of x^3 + 2x^2"
# With quantization
python inference_math.py --load_in_4bit --interactive
```
## Training Details
### Training Data
**Mathematical Corpus (500B tokens):**
- Mathematical proofs: ProofWiki, Lean, Coq, Isabelle (125B tokens)
- Olympiad problems: IMO, USAMO, AMC, AIME, Putnam (150B tokens)
- arXiv papers: math.AC, math.AG, math.NT, math.CO (100B tokens)
- Textbooks: undergraduate to graduate level (75B tokens)
- Q&A: Math StackExchange, MathOverflow (50B tokens)
**Code Corpus (200B tokens):**
- Mathematical Python libraries (NumPy, SymPy, SciPy)
- Computational notebooks from Kaggle, GitHub
- Algorithm implementations
**General Corpus (800B tokens):**
- From Kirim-V1-base pre-training
**Total: 1.5 Trillion tokens**
### Training Procedure
#### Stage 1: Model Expansion (15 days)
- Expanded from 13B to 30B parameters
- Progressive width and depth scaling
- Hidden size: 4096 → 5120
- Layers: 32 → 48
#### Stage 2: Mathematical Pre-training (30 days)
- 500B tokens of mathematical content
- Hardware: 512x NVIDIA H100 80GB
- Batch size: 2048
- Learning rate: 1.5e-4 with cosine decay
- Optimization: AdamW, BF16 precision
#### Stage 3: Instruction Tuning (5 days)
- 200K mathematical instruction-response pairs
- Balanced across algebra, calculus, geometry, etc.
- Learning rate: 2e-5
- 3 epochs
#### Stage 4: Tool Calling Training (3 days)
- 50K tool-calling examples
- Function definition and execution
- Error handling and recovery
#### Stage 5: Reinforcement Learning (7 days)
- PPO-based training
- Reward based on solution correctness
- Symbolic and numerical verification
#### Training Hyperparameters
- **Optimizer:** AdamW
- **Learning rate:** 1.5e-4 (pre-training), 2e-5 (fine-tuning)
- **Weight decay:** 0.1
- **Warmup steps:** 2000
- **Gradient clipping:** 1.0
- **Precision:** BFloat16
- **Total GPU hours:** 30,720
- **Estimated cost:** $450,000 USD
### Compute Infrastructure
- **Pre-training:** 512x NVIDIA H100 80GB GPUs
- **Fine-tuning:** 128x NVIDIA H100 80GB GPUs
- **Framework:** PyTorch 2.1, DeepSpeed ZeRO-3
- **Parallelism:** Tensor (8-way), Pipeline (4-way), Data (16-way)
## Evaluation
### Mathematical Reasoning
| Benchmark | Score | Comparison |
|-----------|-------|------------|
| GSM8K | 94.2% | GPT-4: 92.0% |
| MATH | 78.5% | GPT-4: 76.4% |
| MMLU-Math | 88.7% | GPT-4: 86.9% |
| AMC10/12 | 72.3% | Human avg: 45% |
| AIME | 38.7% | Human qualifier: 40% |
### Tool Calling
| Metric | Score |
|--------|-------|
| Tool Selection | 96.8% |
| Parameter Extraction | 94.2% |
| Execution Success | 92.5% |
| Result Integration | 95.1% |
### Code Generation
| Task | Pass@1 | Pass@10 |
|------|--------|---------|
| HumanEval-Math | 78.3% | 92.1% |
| SymPy Tasks | 82.5% | 94.7% |
| NumPy Tasks | 75.6% | 89.3% |
### Performance
- **Inference Speed:** 45 tokens/second (A100 80GB)
- **Memory:** 60GB (BF16), 30GB (INT8), 20GB (INT4)
- **Latency:** 89ms mean, 145ms p95
## Environmental Impact
- **Hardware:** NVIDIA H100 GPUs
- **Training Time:** 60 days (30,720 GPU hours)
- **Estimated CO₂:** ~8,500 kg CO₂eq
- **Power Consumption:** ~850 MWh
We are committed to reducing environmental impact through efficient training and model optimization.
## Technical Specifications
### Model Architecture
| Parameter | Value |
|-----------|-------|
| Parameters | 30B |
| Hidden Size | 5,120 |
| Layers | 48 |
| Attention Heads | 40 |
| KV Heads | 8 (GQA) |
| Intermediate Size | 13,824 |
| Vocabulary | 102,400 |
| Context Length | 32,768 |
| Position Encoding | RoPE with YaRN |
| Activation | SiLU |
| Normalization | RMSNorm |
### Special Features
- **Tool Calling:** JSON-based function calling
- **Symbolic Solver:** SymPy integration
- **Code Execution:** Sandboxed Python runtime
- **LaTeX Formatting:** Automatic equation formatting
## Citation
```bibtex
@misc{kirim2025math,
title={Kirim-1-Math: Advanced Mathematical Reasoning with Tool Calling},
author={Qiling Research},
year={2025},
publisher={Kirim AI},
url={https://huggingface.co/Kirim-ai/Kirim-1-Math}
}
```
## Model Card Authors
Qiling Research
## Ethical Considerations
### Educational Impact
- May affect traditional mathematics education
- Could reduce development of mental math skills
- Should be used as a learning aid, not replacement
### Accessibility
- Makes advanced mathematics more accessible
- Could democratize STEM education
- May widen gap if access is unequal
### Verification
- Always verify results for critical applications
- Use multiple methods for important calculations
- Maintain human oversight in education
## Glossary
- **Tool Calling:** Ability to invoke external functions for computation
- **Symbolic Solver:** Algebraic manipulation system (SymPy)
- **GQA:** Grouped Query Attention for efficiency
- **RoPE:** Rotary Position Embedding
- **YaRN:** Yet another RoPE extension method |