File size: 9,950 Bytes

d5d55c9

# Model Card for Kirim-1-Math

## Model Details

### Model Description

**Kirim-1-Math** is a 30-billion parameter large language model specialized for advanced mathematical reasoning and problem-solving. It is the first model in the Kirim series to feature built-in tool calling capabilities, allowing it to execute mathematical computations, symbolic manipulations, and code for numerical solutions.

- **Developed by:** Kirim AI Team
- **Model type:** Causal Language Model (Decoder-only Transformer)
- **Language(s):** Chinese, English
- **License:** Apache 2.0
- **Base Model:** Kirim-V1-base (expanded from 13B to 30B)
- **Specialization:** Mathematical reasoning, theorem proving, symbolic computation

### Model Capabilities

- **Mathematical Reasoning**: Solve problems from elementary to olympiad level
- **Tool Calling**: Execute calculator, symbolic solver, derivative, integration, and code execution
- **Step-by-Step Solutions**: Show detailed work for problem-solving
- **LaTeX Output**: Format mathematical expressions properly
- **Bilingual**: Handle problems in both Chinese and English
- **Code Generation**: Write and execute Python/SymPy code for numerical solutions

## Model Sources

- **Repository:** [github.com/Kirim-ai/Kirim-1-Math](https://github.com/Kirim-ai/Kirim-1-Math)
- **Paper:** [Kirim-1-Math: Advanced Mathematical Reasoning with Tool Calling](https://huggingface.co/papers)
- **Demo:** [huggingface.co/spaces/Kirim-ai/Kirim-1-Math-demo](https://huggingface.co/spaces/Kirim-ai/Kirim-1-Math-demo)
- **Base Model:** [Kirim-ai/Kirim-V1-base](https://huggingface.co/Kirim-ai/Kirim-V1-base)

## Uses

### Direct Use

The model can be used directly for:

- **Educational Tutoring**: Explain mathematical concepts with step-by-step reasoning
- **Homework Assistance**: Solve problems across all difficulty levels
- **Competition Preparation**: Practice for AMC, AIME, IMO, Putnam
- **Research Assistance**: Verify proofs and perform symbolic computations
- **Code-Assisted Problem Solving**: Use numerical methods for complex calculations

### Downstream Use

Fine-tuning possibilities:

- Domain-specific mathematical applications (physics, engineering, finance)
- Custom tool integration for specialized computations
- Educational platforms with adaptive difficulty
- Mathematical theorem proving systems

### Out-of-Scope Use

The model should NOT be used for:

- **Academic dishonesty**: Cheating on exams or assignments
- **Safety-critical systems**: Without human verification (e.g., structural engineering calculations)
- **Financial advice**: Trading or investment decisions without expert review
- **Medical calculations**: Drug dosages or medical equipment calibration
- **Legal matters**: Without professional mathematician/lawyer verification

## Bias, Risks, and Limitations

### Known Limitations

**Technical Limitations:**
- Cannot process visual mathematics (diagrams, geometric figures)
- May struggle with extremely novel mathematical concepts
- Limited to training data through October 2024
- Tool execution can fail for edge cases
- Performance degrades on extremely complex graduate-level problems

**Reasoning Limitations:**
- May make logical errors in complex proofs
- Can hallucinate intermediate steps
- Occasionally produces incorrect final answers
- May not recognize when a problem has no solution

**Computational Limitations:**
- Cannot perform arbitrarily large calculations without tools
- Numerical precision limited by underlying libraries
- May timeout on very long computations

### Risks and Biases

**Potential Risks:**
- Students may become over-reliant on AI assistance
- Could generate plausible but incorrect mathematical reasoning
- May perpetuate biases in mathematical education approaches
- Tool execution could consume excessive computational resources

**Mitigation Strategies:**
- Always verify critical results with human experts
- Use temperature=0.1 for deterministic mathematical reasoning
- Enable tool calling for numerical verification
- Cross-check answers with multiple methods
- Implement appropriate safeguards in educational settings

## How to Get Started

### Installation

```bash
pip install torch transformers accelerate sympy
```

### Basic Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model
model = AutoModelForCausalLM.from_pretrained(
    "Kirim-ai/Kirim-1-Math",
    torch_dtype="auto",
    device_map="auto",
    trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(
    "Kirim-ai/Kirim-1-Math",
    trust_remote_code=True
)

# Solve a problem
messages = [
    {"role": "user", "content": "Solve: x² - 5x + 6 = 0"}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
outputs = model.generate(inputs, max_new_tokens=2048, temperature=0.1)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

### Using the Inference Script

```bash
# Interactive mode
python inference_math.py --interactive

# Single problem
python inference_math.py --problem "Calculate the derivative of x^3 + 2x^2"

# With quantization
python inference_math.py --load_in_4bit --interactive
```

## Training Details

### Training Data

**Mathematical Corpus (500B tokens):**
- Mathematical proofs: ProofWiki, Lean, Coq, Isabelle (125B tokens)
- Olympiad problems: IMO, USAMO, AMC, AIME, Putnam (150B tokens)
- arXiv papers: math.AC, math.AG, math.NT, math.CO (100B tokens)
- Textbooks: undergraduate to graduate level (75B tokens)
- Q&A: Math StackExchange, MathOverflow (50B tokens)

**Code Corpus (200B tokens):**
- Mathematical Python libraries (NumPy, SymPy, SciPy)
- Computational notebooks from Kaggle, GitHub
- Algorithm implementations

**General Corpus (800B tokens):**
- From Kirim-V1-base pre-training

**Total: 1.5 Trillion tokens**

### Training Procedure

#### Stage 1: Model Expansion (15 days)
- Expanded from 13B to 30B parameters
- Progressive width and depth scaling
- Hidden size: 4096 → 5120
- Layers: 32 → 48

#### Stage 2: Mathematical Pre-training (30 days)
- 500B tokens of mathematical content
- Hardware: 512x NVIDIA H100 80GB
- Batch size: 2048
- Learning rate: 1.5e-4 with cosine decay
- Optimization: AdamW, BF16 precision

#### Stage 3: Instruction Tuning (5 days)
- 200K mathematical instruction-response pairs
- Balanced across algebra, calculus, geometry, etc.
- Learning rate: 2e-5
- 3 epochs

#### Stage 4: Tool Calling Training (3 days)
- 50K tool-calling examples
- Function definition and execution
- Error handling and recovery

#### Stage 5: Reinforcement Learning (7 days)
- PPO-based training
- Reward based on solution correctness
- Symbolic and numerical verification

#### Training Hyperparameters

- **Optimizer:** AdamW
- **Learning rate:** 1.5e-4 (pre-training), 2e-5 (fine-tuning)
- **Weight decay:** 0.1
- **Warmup steps:** 2000
- **Gradient clipping:** 1.0
- **Precision:** BFloat16
- **Total GPU hours:** 30,720
- **Estimated cost:** $450,000 USD

### Compute Infrastructure

- **Pre-training:** 512x NVIDIA H100 80GB GPUs
- **Fine-tuning:** 128x NVIDIA H100 80GB GPUs
- **Framework:** PyTorch 2.1, DeepSpeed ZeRO-3
- **Parallelism:** Tensor (8-way), Pipeline (4-way), Data (16-way)

## Evaluation

### Mathematical Reasoning

| Benchmark | Score | Comparison |
|-----------|-------|------------|
| GSM8K | 94.2% | GPT-4: 92.0% |
| MATH | 78.5% | GPT-4: 76.4% |
| MMLU-Math | 88.7% | GPT-4: 86.9% |
| AMC10/12 | 72.3% | Human avg: 45% |
| AIME | 38.7% | Human qualifier: 40% |

### Tool Calling

| Metric | Score |
|--------|-------|
| Tool Selection | 96.8% |
| Parameter Extraction | 94.2% |
| Execution Success | 92.5% |
| Result Integration | 95.1% |

### Code Generation

| Task | Pass@1 | Pass@10 |
|------|--------|---------|
| HumanEval-Math | 78.3% | 92.1% |
| SymPy Tasks | 82.5% | 94.7% |
| NumPy Tasks | 75.6% | 89.3% |

### Performance

- **Inference Speed:** 45 tokens/second (A100 80GB)
- **Memory:** 60GB (BF16), 30GB (INT8), 20GB (INT4)
- **Latency:** 89ms mean, 145ms p95

## Environmental Impact

- **Hardware:** NVIDIA H100 GPUs
- **Training Time:** 60 days (30,720 GPU hours)
- **Estimated CO₂:** ~8,500 kg CO₂eq
- **Power Consumption:** ~850 MWh

We are committed to reducing environmental impact through efficient training and model optimization.

## Technical Specifications

### Model Architecture

| Parameter | Value |
|-----------|-------|
| Parameters | 30B |
| Hidden Size | 5,120 |
| Layers | 48 |
| Attention Heads | 40 |
| KV Heads | 8 (GQA) |
| Intermediate Size | 13,824 |
| Vocabulary | 102,400 |
| Context Length | 32,768 |
| Position Encoding | RoPE with YaRN |
| Activation | SiLU |
| Normalization | RMSNorm |

### Special Features

- **Tool Calling:** JSON-based function calling
- **Symbolic Solver:** SymPy integration
- **Code Execution:** Sandboxed Python runtime
- **LaTeX Formatting:** Automatic equation formatting

## Citation

```bibtex
@misc{kirim2025math,
  title={Kirim-1-Math: Advanced Mathematical Reasoning with Tool Calling},
  author={Qiling Research},
  year={2025},
  publisher={Kirim AI},
  url={https://huggingface.co/Kirim-ai/Kirim-1-Math}
}
```

## Model Card Authors

Qiling Research

## Ethical Considerations

### Educational Impact

- May affect traditional mathematics education
- Could reduce development of mental math skills
- Should be used as a learning aid, not replacement

### Accessibility

- Makes advanced mathematics more accessible
- Could democratize STEM education
- May widen gap if access is unequal

### Verification

- Always verify results for critical applications
- Use multiple methods for important calculations
- Maintain human oversight in education

## Glossary

- **Tool Calling:** Ability to invoke external functions for computation
- **Symbolic Solver:** Algebraic manipulation system (SymPy)
- **GQA:** Grouped Query Attention for efficiency
- **RoPE:** Rotary Position Embedding
- **YaRN:** Yet another RoPE extension method