Kirim-1-Math / MODEL_CARD.md
Kirim1's picture
Create MODEL_CARD.md
d5d55c9 verified
# Model Card for Kirim-1-Math
## Model Details
### Model Description
**Kirim-1-Math** is a 30-billion parameter large language model specialized for advanced mathematical reasoning and problem-solving. It is the first model in the Kirim series to feature built-in tool calling capabilities, allowing it to execute mathematical computations, symbolic manipulations, and code for numerical solutions.
- **Developed by:** Kirim AI Team
- **Model type:** Causal Language Model (Decoder-only Transformer)
- **Language(s):** Chinese, English
- **License:** Apache 2.0
- **Base Model:** Kirim-V1-base (expanded from 13B to 30B)
- **Specialization:** Mathematical reasoning, theorem proving, symbolic computation
### Model Capabilities
- **Mathematical Reasoning**: Solve problems from elementary to olympiad level
- **Tool Calling**: Execute calculator, symbolic solver, derivative, integration, and code execution
- **Step-by-Step Solutions**: Show detailed work for problem-solving
- **LaTeX Output**: Format mathematical expressions properly
- **Bilingual**: Handle problems in both Chinese and English
- **Code Generation**: Write and execute Python/SymPy code for numerical solutions
## Model Sources
- **Repository:** [github.com/Kirim-ai/Kirim-1-Math](https://github.com/Kirim-ai/Kirim-1-Math)
- **Paper:** [Kirim-1-Math: Advanced Mathematical Reasoning with Tool Calling](https://huggingface.co/papers)
- **Demo:** [huggingface.co/spaces/Kirim-ai/Kirim-1-Math-demo](https://huggingface.co/spaces/Kirim-ai/Kirim-1-Math-demo)
- **Base Model:** [Kirim-ai/Kirim-V1-base](https://huggingface.co/Kirim-ai/Kirim-V1-base)
## Uses
### Direct Use
The model can be used directly for:
- **Educational Tutoring**: Explain mathematical concepts with step-by-step reasoning
- **Homework Assistance**: Solve problems across all difficulty levels
- **Competition Preparation**: Practice for AMC, AIME, IMO, Putnam
- **Research Assistance**: Verify proofs and perform symbolic computations
- **Code-Assisted Problem Solving**: Use numerical methods for complex calculations
### Downstream Use
Fine-tuning possibilities:
- Domain-specific mathematical applications (physics, engineering, finance)
- Custom tool integration for specialized computations
- Educational platforms with adaptive difficulty
- Mathematical theorem proving systems
### Out-of-Scope Use
The model should NOT be used for:
- **Academic dishonesty**: Cheating on exams or assignments
- **Safety-critical systems**: Without human verification (e.g., structural engineering calculations)
- **Financial advice**: Trading or investment decisions without expert review
- **Medical calculations**: Drug dosages or medical equipment calibration
- **Legal matters**: Without professional mathematician/lawyer verification
## Bias, Risks, and Limitations
### Known Limitations
**Technical Limitations:**
- Cannot process visual mathematics (diagrams, geometric figures)
- May struggle with extremely novel mathematical concepts
- Limited to training data through October 2024
- Tool execution can fail for edge cases
- Performance degrades on extremely complex graduate-level problems
**Reasoning Limitations:**
- May make logical errors in complex proofs
- Can hallucinate intermediate steps
- Occasionally produces incorrect final answers
- May not recognize when a problem has no solution
**Computational Limitations:**
- Cannot perform arbitrarily large calculations without tools
- Numerical precision limited by underlying libraries
- May timeout on very long computations
### Risks and Biases
**Potential Risks:**
- Students may become over-reliant on AI assistance
- Could generate plausible but incorrect mathematical reasoning
- May perpetuate biases in mathematical education approaches
- Tool execution could consume excessive computational resources
**Mitigation Strategies:**
- Always verify critical results with human experts
- Use temperature=0.1 for deterministic mathematical reasoning
- Enable tool calling for numerical verification
- Cross-check answers with multiple methods
- Implement appropriate safeguards in educational settings
## How to Get Started
### Installation
```bash
pip install torch transformers accelerate sympy
```
### Basic Usage
```python
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model
model = AutoModelForCausalLM.from_pretrained(
"Kirim-ai/Kirim-1-Math",
torch_dtype="auto",
device_map="auto",
trust_remote_code=True
)
tokenizer = AutoTokenizer.from_pretrained(
"Kirim-ai/Kirim-1-Math",
trust_remote_code=True
)
# Solve a problem
messages = [
{"role": "user", "content": "Solve: x² - 5x + 6 = 0"}
]
inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=2048, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```
### Using the Inference Script
```bash
# Interactive mode
python inference_math.py --interactive
# Single problem
python inference_math.py --problem "Calculate the derivative of x^3 + 2x^2"
# With quantization
python inference_math.py --load_in_4bit --interactive
```
## Training Details
### Training Data
**Mathematical Corpus (500B tokens):**
- Mathematical proofs: ProofWiki, Lean, Coq, Isabelle (125B tokens)
- Olympiad problems: IMO, USAMO, AMC, AIME, Putnam (150B tokens)
- arXiv papers: math.AC, math.AG, math.NT, math.CO (100B tokens)
- Textbooks: undergraduate to graduate level (75B tokens)
- Q&A: Math StackExchange, MathOverflow (50B tokens)
**Code Corpus (200B tokens):**
- Mathematical Python libraries (NumPy, SymPy, SciPy)
- Computational notebooks from Kaggle, GitHub
- Algorithm implementations
**General Corpus (800B tokens):**
- From Kirim-V1-base pre-training
**Total: 1.5 Trillion tokens**
### Training Procedure
#### Stage 1: Model Expansion (15 days)
- Expanded from 13B to 30B parameters
- Progressive width and depth scaling
- Hidden size: 4096 → 5120
- Layers: 32 → 48
#### Stage 2: Mathematical Pre-training (30 days)
- 500B tokens of mathematical content
- Hardware: 512x NVIDIA H100 80GB
- Batch size: 2048
- Learning rate: 1.5e-4 with cosine decay
- Optimization: AdamW, BF16 precision
#### Stage 3: Instruction Tuning (5 days)
- 200K mathematical instruction-response pairs
- Balanced across algebra, calculus, geometry, etc.
- Learning rate: 2e-5
- 3 epochs
#### Stage 4: Tool Calling Training (3 days)
- 50K tool-calling examples
- Function definition and execution
- Error handling and recovery
#### Stage 5: Reinforcement Learning (7 days)
- PPO-based training
- Reward based on solution correctness
- Symbolic and numerical verification
#### Training Hyperparameters
- **Optimizer:** AdamW
- **Learning rate:** 1.5e-4 (pre-training), 2e-5 (fine-tuning)
- **Weight decay:** 0.1
- **Warmup steps:** 2000
- **Gradient clipping:** 1.0
- **Precision:** BFloat16
- **Total GPU hours:** 30,720
- **Estimated cost:** $450,000 USD
### Compute Infrastructure
- **Pre-training:** 512x NVIDIA H100 80GB GPUs
- **Fine-tuning:** 128x NVIDIA H100 80GB GPUs
- **Framework:** PyTorch 2.1, DeepSpeed ZeRO-3
- **Parallelism:** Tensor (8-way), Pipeline (4-way), Data (16-way)
## Evaluation
### Mathematical Reasoning
| Benchmark | Score | Comparison |
|-----------|-------|------------|
| GSM8K | 94.2% | GPT-4: 92.0% |
| MATH | 78.5% | GPT-4: 76.4% |
| MMLU-Math | 88.7% | GPT-4: 86.9% |
| AMC10/12 | 72.3% | Human avg: 45% |
| AIME | 38.7% | Human qualifier: 40% |
### Tool Calling
| Metric | Score |
|--------|-------|
| Tool Selection | 96.8% |
| Parameter Extraction | 94.2% |
| Execution Success | 92.5% |
| Result Integration | 95.1% |
### Code Generation
| Task | Pass@1 | Pass@10 |
|------|--------|---------|
| HumanEval-Math | 78.3% | 92.1% |
| SymPy Tasks | 82.5% | 94.7% |
| NumPy Tasks | 75.6% | 89.3% |
### Performance
- **Inference Speed:** 45 tokens/second (A100 80GB)
- **Memory:** 60GB (BF16), 30GB (INT8), 20GB (INT4)
- **Latency:** 89ms mean, 145ms p95
## Environmental Impact
- **Hardware:** NVIDIA H100 GPUs
- **Training Time:** 60 days (30,720 GPU hours)
- **Estimated CO₂:** ~8,500 kg CO₂eq
- **Power Consumption:** ~850 MWh
We are committed to reducing environmental impact through efficient training and model optimization.
## Technical Specifications
### Model Architecture
| Parameter | Value |
|-----------|-------|
| Parameters | 30B |
| Hidden Size | 5,120 |
| Layers | 48 |
| Attention Heads | 40 |
| KV Heads | 8 (GQA) |
| Intermediate Size | 13,824 |
| Vocabulary | 102,400 |
| Context Length | 32,768 |
| Position Encoding | RoPE with YaRN |
| Activation | SiLU |
| Normalization | RMSNorm |
### Special Features
- **Tool Calling:** JSON-based function calling
- **Symbolic Solver:** SymPy integration
- **Code Execution:** Sandboxed Python runtime
- **LaTeX Formatting:** Automatic equation formatting
## Citation
```bibtex
@misc{kirim2025math,
title={Kirim-1-Math: Advanced Mathematical Reasoning with Tool Calling},
author={Qiling Research},
year={2025},
publisher={Kirim AI},
url={https://huggingface.co/Kirim-ai/Kirim-1-Math}
}
```
## Model Card Authors
Qiling Research
## Ethical Considerations
### Educational Impact
- May affect traditional mathematics education
- Could reduce development of mental math skills
- Should be used as a learning aid, not replacement
### Accessibility
- Makes advanced mathematics more accessible
- Could democratize STEM education
- May widen gap if access is unequal
### Verification
- Always verify results for critical applications
- Use multiple methods for important calculations
- Maintain human oversight in education
## Glossary
- **Tool Calling:** Ability to invoke external functions for computation
- **Symbolic Solver:** Algebraic manipulation system (SymPy)
- **GQA:** Grouped Query Attention for efficiency
- **RoPE:** Rotary Position Embedding
- **YaRN:** Yet another RoPE extension method