| # Model Card for Kirim-1-Math | |
| ## Model Details | |
| ### Model Description | |
| **Kirim-1-Math** is a 30-billion parameter large language model specialized for advanced mathematical reasoning and problem-solving. It is the first model in the Kirim series to feature built-in tool calling capabilities, allowing it to execute mathematical computations, symbolic manipulations, and code for numerical solutions. | |
| - **Developed by:** Kirim AI Team | |
| - **Model type:** Causal Language Model (Decoder-only Transformer) | |
| - **Language(s):** Chinese, English | |
| - **License:** Apache 2.0 | |
| - **Base Model:** Kirim-V1-base (expanded from 13B to 30B) | |
| - **Specialization:** Mathematical reasoning, theorem proving, symbolic computation | |
| ### Model Capabilities | |
| - **Mathematical Reasoning**: Solve problems from elementary to olympiad level | |
| - **Tool Calling**: Execute calculator, symbolic solver, derivative, integration, and code execution | |
| - **Step-by-Step Solutions**: Show detailed work for problem-solving | |
| - **LaTeX Output**: Format mathematical expressions properly | |
| - **Bilingual**: Handle problems in both Chinese and English | |
| - **Code Generation**: Write and execute Python/SymPy code for numerical solutions | |
| ## Model Sources | |
| - **Repository:** [github.com/Kirim-ai/Kirim-1-Math](https://github.com/Kirim-ai/Kirim-1-Math) | |
| - **Paper:** [Kirim-1-Math: Advanced Mathematical Reasoning with Tool Calling](https://huggingface.co/papers) | |
| - **Demo:** [huggingface.co/spaces/Kirim-ai/Kirim-1-Math-demo](https://huggingface.co/spaces/Kirim-ai/Kirim-1-Math-demo) | |
| - **Base Model:** [Kirim-ai/Kirim-V1-base](https://huggingface.co/Kirim-ai/Kirim-V1-base) | |
| ## Uses | |
| ### Direct Use | |
| The model can be used directly for: | |
| - **Educational Tutoring**: Explain mathematical concepts with step-by-step reasoning | |
| - **Homework Assistance**: Solve problems across all difficulty levels | |
| - **Competition Preparation**: Practice for AMC, AIME, IMO, Putnam | |
| - **Research Assistance**: Verify proofs and perform symbolic computations | |
| - **Code-Assisted Problem Solving**: Use numerical methods for complex calculations | |
| ### Downstream Use | |
| Fine-tuning possibilities: | |
| - Domain-specific mathematical applications (physics, engineering, finance) | |
| - Custom tool integration for specialized computations | |
| - Educational platforms with adaptive difficulty | |
| - Mathematical theorem proving systems | |
| ### Out-of-Scope Use | |
| The model should NOT be used for: | |
| - **Academic dishonesty**: Cheating on exams or assignments | |
| - **Safety-critical systems**: Without human verification (e.g., structural engineering calculations) | |
| - **Financial advice**: Trading or investment decisions without expert review | |
| - **Medical calculations**: Drug dosages or medical equipment calibration | |
| - **Legal matters**: Without professional mathematician/lawyer verification | |
| ## Bias, Risks, and Limitations | |
| ### Known Limitations | |
| **Technical Limitations:** | |
| - Cannot process visual mathematics (diagrams, geometric figures) | |
| - May struggle with extremely novel mathematical concepts | |
| - Limited to training data through October 2024 | |
| - Tool execution can fail for edge cases | |
| - Performance degrades on extremely complex graduate-level problems | |
| **Reasoning Limitations:** | |
| - May make logical errors in complex proofs | |
| - Can hallucinate intermediate steps | |
| - Occasionally produces incorrect final answers | |
| - May not recognize when a problem has no solution | |
| **Computational Limitations:** | |
| - Cannot perform arbitrarily large calculations without tools | |
| - Numerical precision limited by underlying libraries | |
| - May timeout on very long computations | |
| ### Risks and Biases | |
| **Potential Risks:** | |
| - Students may become over-reliant on AI assistance | |
| - Could generate plausible but incorrect mathematical reasoning | |
| - May perpetuate biases in mathematical education approaches | |
| - Tool execution could consume excessive computational resources | |
| **Mitigation Strategies:** | |
| - Always verify critical results with human experts | |
| - Use temperature=0.1 for deterministic mathematical reasoning | |
| - Enable tool calling for numerical verification | |
| - Cross-check answers with multiple methods | |
| - Implement appropriate safeguards in educational settings | |
| ## How to Get Started | |
| ### Installation | |
| ```bash | |
| pip install torch transformers accelerate sympy | |
| ``` | |
| ### Basic Usage | |
| ```python | |
| from transformers import AutoModelForCausalLM, AutoTokenizer | |
| # Load model | |
| model = AutoModelForCausalLM.from_pretrained( | |
| "Kirim-ai/Kirim-1-Math", | |
| torch_dtype="auto", | |
| device_map="auto", | |
| trust_remote_code=True | |
| ) | |
| tokenizer = AutoTokenizer.from_pretrained( | |
| "Kirim-ai/Kirim-1-Math", | |
| trust_remote_code=True | |
| ) | |
| # Solve a problem | |
| messages = [ | |
| {"role": "user", "content": "Solve: x² - 5x + 6 = 0"} | |
| ] | |
| inputs = tokenizer.apply_chat_template(messages, return_tensors="pt") | |
| outputs = model.generate(inputs, max_new_tokens=2048, temperature=0.1) | |
| print(tokenizer.decode(outputs[0], skip_special_tokens=True)) | |
| ``` | |
| ### Using the Inference Script | |
| ```bash | |
| # Interactive mode | |
| python inference_math.py --interactive | |
| # Single problem | |
| python inference_math.py --problem "Calculate the derivative of x^3 + 2x^2" | |
| # With quantization | |
| python inference_math.py --load_in_4bit --interactive | |
| ``` | |
| ## Training Details | |
| ### Training Data | |
| **Mathematical Corpus (500B tokens):** | |
| - Mathematical proofs: ProofWiki, Lean, Coq, Isabelle (125B tokens) | |
| - Olympiad problems: IMO, USAMO, AMC, AIME, Putnam (150B tokens) | |
| - arXiv papers: math.AC, math.AG, math.NT, math.CO (100B tokens) | |
| - Textbooks: undergraduate to graduate level (75B tokens) | |
| - Q&A: Math StackExchange, MathOverflow (50B tokens) | |
| **Code Corpus (200B tokens):** | |
| - Mathematical Python libraries (NumPy, SymPy, SciPy) | |
| - Computational notebooks from Kaggle, GitHub | |
| - Algorithm implementations | |
| **General Corpus (800B tokens):** | |
| - From Kirim-V1-base pre-training | |
| **Total: 1.5 Trillion tokens** | |
| ### Training Procedure | |
| #### Stage 1: Model Expansion (15 days) | |
| - Expanded from 13B to 30B parameters | |
| - Progressive width and depth scaling | |
| - Hidden size: 4096 → 5120 | |
| - Layers: 32 → 48 | |
| #### Stage 2: Mathematical Pre-training (30 days) | |
| - 500B tokens of mathematical content | |
| - Hardware: 512x NVIDIA H100 80GB | |
| - Batch size: 2048 | |
| - Learning rate: 1.5e-4 with cosine decay | |
| - Optimization: AdamW, BF16 precision | |
| #### Stage 3: Instruction Tuning (5 days) | |
| - 200K mathematical instruction-response pairs | |
| - Balanced across algebra, calculus, geometry, etc. | |
| - Learning rate: 2e-5 | |
| - 3 epochs | |
| #### Stage 4: Tool Calling Training (3 days) | |
| - 50K tool-calling examples | |
| - Function definition and execution | |
| - Error handling and recovery | |
| #### Stage 5: Reinforcement Learning (7 days) | |
| - PPO-based training | |
| - Reward based on solution correctness | |
| - Symbolic and numerical verification | |
| #### Training Hyperparameters | |
| - **Optimizer:** AdamW | |
| - **Learning rate:** 1.5e-4 (pre-training), 2e-5 (fine-tuning) | |
| - **Weight decay:** 0.1 | |
| - **Warmup steps:** 2000 | |
| - **Gradient clipping:** 1.0 | |
| - **Precision:** BFloat16 | |
| - **Total GPU hours:** 30,720 | |
| - **Estimated cost:** $450,000 USD | |
| ### Compute Infrastructure | |
| - **Pre-training:** 512x NVIDIA H100 80GB GPUs | |
| - **Fine-tuning:** 128x NVIDIA H100 80GB GPUs | |
| - **Framework:** PyTorch 2.1, DeepSpeed ZeRO-3 | |
| - **Parallelism:** Tensor (8-way), Pipeline (4-way), Data (16-way) | |
| ## Evaluation | |
| ### Mathematical Reasoning | |
| | Benchmark | Score | Comparison | | |
| |-----------|-------|------------| | |
| | GSM8K | 94.2% | GPT-4: 92.0% | | |
| | MATH | 78.5% | GPT-4: 76.4% | | |
| | MMLU-Math | 88.7% | GPT-4: 86.9% | | |
| | AMC10/12 | 72.3% | Human avg: 45% | | |
| | AIME | 38.7% | Human qualifier: 40% | | |
| ### Tool Calling | |
| | Metric | Score | | |
| |--------|-------| | |
| | Tool Selection | 96.8% | | |
| | Parameter Extraction | 94.2% | | |
| | Execution Success | 92.5% | | |
| | Result Integration | 95.1% | | |
| ### Code Generation | |
| | Task | Pass@1 | Pass@10 | | |
| |------|--------|---------| | |
| | HumanEval-Math | 78.3% | 92.1% | | |
| | SymPy Tasks | 82.5% | 94.7% | | |
| | NumPy Tasks | 75.6% | 89.3% | | |
| ### Performance | |
| - **Inference Speed:** 45 tokens/second (A100 80GB) | |
| - **Memory:** 60GB (BF16), 30GB (INT8), 20GB (INT4) | |
| - **Latency:** 89ms mean, 145ms p95 | |
| ## Environmental Impact | |
| - **Hardware:** NVIDIA H100 GPUs | |
| - **Training Time:** 60 days (30,720 GPU hours) | |
| - **Estimated CO₂:** ~8,500 kg CO₂eq | |
| - **Power Consumption:** ~850 MWh | |
| We are committed to reducing environmental impact through efficient training and model optimization. | |
| ## Technical Specifications | |
| ### Model Architecture | |
| | Parameter | Value | | |
| |-----------|-------| | |
| | Parameters | 30B | | |
| | Hidden Size | 5,120 | | |
| | Layers | 48 | | |
| | Attention Heads | 40 | | |
| | KV Heads | 8 (GQA) | | |
| | Intermediate Size | 13,824 | | |
| | Vocabulary | 102,400 | | |
| | Context Length | 32,768 | | |
| | Position Encoding | RoPE with YaRN | | |
| | Activation | SiLU | | |
| | Normalization | RMSNorm | | |
| ### Special Features | |
| - **Tool Calling:** JSON-based function calling | |
| - **Symbolic Solver:** SymPy integration | |
| - **Code Execution:** Sandboxed Python runtime | |
| - **LaTeX Formatting:** Automatic equation formatting | |
| ## Citation | |
| ```bibtex | |
| @misc{kirim2025math, | |
| title={Kirim-1-Math: Advanced Mathematical Reasoning with Tool Calling}, | |
| author={Qiling Research}, | |
| year={2025}, | |
| publisher={Kirim AI}, | |
| url={https://huggingface.co/Kirim-ai/Kirim-1-Math} | |
| } | |
| ``` | |
| ## Model Card Authors | |
| Qiling Research | |
| ## Ethical Considerations | |
| ### Educational Impact | |
| - May affect traditional mathematics education | |
| - Could reduce development of mental math skills | |
| - Should be used as a learning aid, not replacement | |
| ### Accessibility | |
| - Makes advanced mathematics more accessible | |
| - Could democratize STEM education | |
| - May widen gap if access is unequal | |
| ### Verification | |
| - Always verify results for critical applications | |
| - Use multiple methods for important calculations | |
| - Maintain human oversight in education | |
| ## Glossary | |
| - **Tool Calling:** Ability to invoke external functions for computation | |
| - **Symbolic Solver:** Algebraic manipulation system (SymPy) | |
| - **GQA:** Grouped Query Attention for efficiency | |
| - **RoPE:** Rotary Position Embedding | |
| - **YaRN:** Yet another RoPE extension method |