Kirim-1-Math / MODEL_CARD.md

Create MODEL_CARD.md

d5d55c9 verified 30 days ago

9.95 kB

	# Model Card for Kirim-1-Math

	## Model Details

	### Model Description

	Kirim-1-Math is a 30-billion parameter large language model specialized for advanced mathematical reasoning and problem-solving. It is the first model in the Kirim series to feature built-in tool calling capabilities, allowing it to execute mathematical computations, symbolic manipulations, and code for numerical solutions.

	- Developed by: Kirim AI Team
	- Model type: Causal Language Model (Decoder-only Transformer)
	- Language(s): Chinese, English
	- License: Apache 2.0
	- Base Model: Kirim-V1-base (expanded from 13B to 30B)
	- Specialization: Mathematical reasoning, theorem proving, symbolic computation

	### Model Capabilities

	- Mathematical Reasoning: Solve problems from elementary to olympiad level
	- Tool Calling: Execute calculator, symbolic solver, derivative, integration, and code execution
	- Step-by-Step Solutions: Show detailed work for problem-solving
	- LaTeX Output: Format mathematical expressions properly
	- Bilingual: Handle problems in both Chinese and English
	- Code Generation: Write and execute Python/SymPy code for numerical solutions

	## Model Sources

	- Repository: [github.com/Kirim-ai/Kirim-1-Math](https://github.com/Kirim-ai/Kirim-1-Math)
	- Paper: [Kirim-1-Math: Advanced Mathematical Reasoning with Tool Calling](https://huggingface.co/papers)
	- Demo: [huggingface.co/spaces/Kirim-ai/Kirim-1-Math-demo](https://huggingface.co/spaces/Kirim-ai/Kirim-1-Math-demo)
	- Base Model: [Kirim-ai/Kirim-V1-base](https://huggingface.co/Kirim-ai/Kirim-V1-base)

	## Uses

	### Direct Use

	The model can be used directly for:

	- Educational Tutoring: Explain mathematical concepts with step-by-step reasoning
	- Homework Assistance: Solve problems across all difficulty levels
	- Competition Preparation: Practice for AMC, AIME, IMO, Putnam
	- Research Assistance: Verify proofs and perform symbolic computations
	- Code-Assisted Problem Solving: Use numerical methods for complex calculations

	### Downstream Use

	Fine-tuning possibilities:

	- Domain-specific mathematical applications (physics, engineering, finance)
	- Custom tool integration for specialized computations
	- Educational platforms with adaptive difficulty
	- Mathematical theorem proving systems

	### Out-of-Scope Use

	The model should NOT be used for:

	- Academic dishonesty: Cheating on exams or assignments
	- Safety-critical systems: Without human verification (e.g., structural engineering calculations)
	- Financial advice: Trading or investment decisions without expert review
	- Medical calculations: Drug dosages or medical equipment calibration
	- Legal matters: Without professional mathematician/lawyer verification

	## Bias, Risks, and Limitations

	### Known Limitations

	Technical Limitations:
	- Cannot process visual mathematics (diagrams, geometric figures)
	- May struggle with extremely novel mathematical concepts
	- Limited to training data through October 2024
	- Tool execution can fail for edge cases
	- Performance degrades on extremely complex graduate-level problems

	Reasoning Limitations:
	- May make logical errors in complex proofs
	- Can hallucinate intermediate steps
	- Occasionally produces incorrect final answers
	- May not recognize when a problem has no solution

	Computational Limitations:
	- Cannot perform arbitrarily large calculations without tools
	- Numerical precision limited by underlying libraries
	- May timeout on very long computations

	### Risks and Biases

	Potential Risks:
	- Students may become over-reliant on AI assistance
	- Could generate plausible but incorrect mathematical reasoning
	- May perpetuate biases in mathematical education approaches
	- Tool execution could consume excessive computational resources

	Mitigation Strategies:
	- Always verify critical results with human experts
	- Use temperature=0.1 for deterministic mathematical reasoning
	- Enable tool calling for numerical verification
	- Cross-check answers with multiple methods
	- Implement appropriate safeguards in educational settings

	## How to Get Started

	### Installation

	```bash
	pip install torch transformers accelerate sympy
	```

	### Basic Usage

	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer

	# Load model
	model = AutoModelForCausalLM.from_pretrained(
	"Kirim-ai/Kirim-1-Math",
	torch_dtype="auto",
	device_map="auto",
	trust_remote_code=True
	)

	tokenizer = AutoTokenizer.from_pretrained(
	"Kirim-ai/Kirim-1-Math",
	trust_remote_code=True
	)

	# Solve a problem
	messages = [
	{"role": "user", "content": "Solve: x² - 5x + 6 = 0"}
	]

	inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
	outputs = model.generate(inputs, max_new_tokens=2048, temperature=0.1)
	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	```

	### Using the Inference Script

	```bash
	# Interactive mode
	python inference_math.py --interactive

	# Single problem
	python inference_math.py --problem "Calculate the derivative of x^3 + 2x^2"

	# With quantization
	python inference_math.py --load_in_4bit --interactive
	```

	## Training Details

	### Training Data

	Mathematical Corpus (500B tokens):
	- Mathematical proofs: ProofWiki, Lean, Coq, Isabelle (125B tokens)
	- Olympiad problems: IMO, USAMO, AMC, AIME, Putnam (150B tokens)
	- arXiv papers: math.AC, math.AG, math.NT, math.CO (100B tokens)
	- Textbooks: undergraduate to graduate level (75B tokens)
	- Q&A: Math StackExchange, MathOverflow (50B tokens)

	Code Corpus (200B tokens):
	- Mathematical Python libraries (NumPy, SymPy, SciPy)
	- Computational notebooks from Kaggle, GitHub
	- Algorithm implementations

	General Corpus (800B tokens):
	- From Kirim-V1-base pre-training

	Total: 1.5 Trillion tokens

	### Training Procedure

	#### Stage 1: Model Expansion (15 days)
	- Expanded from 13B to 30B parameters
	- Progressive width and depth scaling
	- Hidden size: 4096 → 5120
	- Layers: 32 → 48

	#### Stage 2: Mathematical Pre-training (30 days)
	- 500B tokens of mathematical content
	- Hardware: 512x NVIDIA H100 80GB
	- Batch size: 2048
	- Learning rate: 1.5e-4 with cosine decay
	- Optimization: AdamW, BF16 precision

	#### Stage 3: Instruction Tuning (5 days)
	- 200K mathematical instruction-response pairs
	- Balanced across algebra, calculus, geometry, etc.
	- Learning rate: 2e-5
	- 3 epochs

	#### Stage 4: Tool Calling Training (3 days)
	- 50K tool-calling examples
	- Function definition and execution
	- Error handling and recovery

	#### Stage 5: Reinforcement Learning (7 days)
	- PPO-based training
	- Reward based on solution correctness
	- Symbolic and numerical verification

	#### Training Hyperparameters

	- Optimizer: AdamW
	- Learning rate: 1.5e-4 (pre-training), 2e-5 (fine-tuning)
	- Weight decay: 0.1
	- Warmup steps: 2000
	- Gradient clipping: 1.0
	- Precision: BFloat16
	- Total GPU hours: 30,720
	- Estimated cost: $450,000 USD

	### Compute Infrastructure

	- Pre-training: 512x NVIDIA H100 80GB GPUs
	- Fine-tuning: 128x NVIDIA H100 80GB GPUs
	- Framework: PyTorch 2.1, DeepSpeed ZeRO-3
	- Parallelism: Tensor (8-way), Pipeline (4-way), Data (16-way)

	## Evaluation

	### Mathematical Reasoning

	\| Benchmark \| Score \| Comparison \|
	\|-----------\|-------\|------------\|
	\| GSM8K \| 94.2% \| GPT-4: 92.0% \|
	\| MATH \| 78.5% \| GPT-4: 76.4% \|
	\| MMLU-Math \| 88.7% \| GPT-4: 86.9% \|
	\| AMC10/12 \| 72.3% \| Human avg: 45% \|
	\| AIME \| 38.7% \| Human qualifier: 40% \|

	### Tool Calling

	\| Metric \| Score \|
	\|--------\|-------\|
	\| Tool Selection \| 96.8% \|
	\| Parameter Extraction \| 94.2% \|
	\| Execution Success \| 92.5% \|
	\| Result Integration \| 95.1% \|

	### Code Generation

	\| Task \| Pass@1 \| Pass@10 \|
	\|------\|--------\|---------\|
	\| HumanEval-Math \| 78.3% \| 92.1% \|
	\| SymPy Tasks \| 82.5% \| 94.7% \|
	\| NumPy Tasks \| 75.6% \| 89.3% \|

	### Performance

	- Inference Speed: 45 tokens/second (A100 80GB)
	- Memory: 60GB (BF16), 30GB (INT8), 20GB (INT4)
	- Latency: 89ms mean, 145ms p95

	## Environmental Impact

	- Hardware: NVIDIA H100 GPUs
	- Training Time: 60 days (30,720 GPU hours)
	- Estimated CO₂: ~8,500 kg CO₂eq
	- Power Consumption: ~850 MWh

	We are committed to reducing environmental impact through efficient training and model optimization.

	## Technical Specifications

	### Model Architecture

	\| Parameter \| Value \|
	\|-----------\|-------\|
	\| Parameters \| 30B \|
	\| Hidden Size \| 5,120 \|
	\| Layers \| 48 \|
	\| Attention Heads \| 40 \|
	\| KV Heads \| 8 (GQA) \|
	\| Intermediate Size \| 13,824 \|
	\| Vocabulary \| 102,400 \|
	\| Context Length \| 32,768 \|
	\| Position Encoding \| RoPE with YaRN \|
	\| Activation \| SiLU \|
	\| Normalization \| RMSNorm \|

	### Special Features

	- Tool Calling: JSON-based function calling
	- Symbolic Solver: SymPy integration
	- Code Execution: Sandboxed Python runtime
	- LaTeX Formatting: Automatic equation formatting

	## Citation

	```bibtex
	@misc{kirim2025math,
	title={Kirim-1-Math: Advanced Mathematical Reasoning with Tool Calling},
	author={Qiling Research},
	year={2025},
	publisher={Kirim AI},
	url={https://huggingface.co/Kirim-ai/Kirim-1-Math}
	}
	```

	## Model Card Authors

	Qiling Research

	## Ethical Considerations

	### Educational Impact

	- May affect traditional mathematics education
	- Could reduce development of mental math skills
	- Should be used as a learning aid, not replacement

	### Accessibility

	- Makes advanced mathematics more accessible
	- Could democratize STEM education
	- May widen gap if access is unequal

	### Verification

	- Always verify results for critical applications
	- Use multiple methods for important calculations
	- Maintain human oversight in education

	## Glossary

	- Tool Calling: Ability to invoke external functions for computation
	- Symbolic Solver: Algebraic manipulation system (SymPy)
	- GQA: Grouped Query Attention for efficiency
	- RoPE: Rotary Position Embedding
	- YaRN: Yet another RoPE extension method