Song Yi commited on
Commit
d5d55c9
·
verified ·
1 Parent(s): ca8c2ab

Create MODEL_CARD.md

Browse files
Files changed (1) hide show
  1. MODEL_CARD.md +330 -0
MODEL_CARD.md ADDED
@@ -0,0 +1,330 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model Card for Kirim-1-Math
2
+
3
+ ## Model Details
4
+
5
+ ### Model Description
6
+
7
+ **Kirim-1-Math** is a 30-billion parameter large language model specialized for advanced mathematical reasoning and problem-solving. It is the first model in the Kirim series to feature built-in tool calling capabilities, allowing it to execute mathematical computations, symbolic manipulations, and code for numerical solutions.
8
+
9
+ - **Developed by:** Kirim AI Team
10
+ - **Model type:** Causal Language Model (Decoder-only Transformer)
11
+ - **Language(s):** Chinese, English
12
+ - **License:** Apache 2.0
13
+ - **Base Model:** Kirim-V1-base (expanded from 13B to 30B)
14
+ - **Specialization:** Mathematical reasoning, theorem proving, symbolic computation
15
+
16
+ ### Model Capabilities
17
+
18
+ - **Mathematical Reasoning**: Solve problems from elementary to olympiad level
19
+ - **Tool Calling**: Execute calculator, symbolic solver, derivative, integration, and code execution
20
+ - **Step-by-Step Solutions**: Show detailed work for problem-solving
21
+ - **LaTeX Output**: Format mathematical expressions properly
22
+ - **Bilingual**: Handle problems in both Chinese and English
23
+ - **Code Generation**: Write and execute Python/SymPy code for numerical solutions
24
+
25
+ ## Model Sources
26
+
27
+ - **Repository:** [github.com/Kirim-ai/Kirim-1-Math](https://github.com/Kirim-ai/Kirim-1-Math)
28
+ - **Paper:** [Kirim-1-Math: Advanced Mathematical Reasoning with Tool Calling](https://huggingface.co/papers)
29
+ - **Demo:** [huggingface.co/spaces/Kirim-ai/Kirim-1-Math-demo](https://huggingface.co/spaces/Kirim-ai/Kirim-1-Math-demo)
30
+ - **Base Model:** [Kirim-ai/Kirim-V1-base](https://huggingface.co/Kirim-ai/Kirim-V1-base)
31
+
32
+ ## Uses
33
+
34
+ ### Direct Use
35
+
36
+ The model can be used directly for:
37
+
38
+ - **Educational Tutoring**: Explain mathematical concepts with step-by-step reasoning
39
+ - **Homework Assistance**: Solve problems across all difficulty levels
40
+ - **Competition Preparation**: Practice for AMC, AIME, IMO, Putnam
41
+ - **Research Assistance**: Verify proofs and perform symbolic computations
42
+ - **Code-Assisted Problem Solving**: Use numerical methods for complex calculations
43
+
44
+ ### Downstream Use
45
+
46
+ Fine-tuning possibilities:
47
+
48
+ - Domain-specific mathematical applications (physics, engineering, finance)
49
+ - Custom tool integration for specialized computations
50
+ - Educational platforms with adaptive difficulty
51
+ - Mathematical theorem proving systems
52
+
53
+ ### Out-of-Scope Use
54
+
55
+ The model should NOT be used for:
56
+
57
+ - **Academic dishonesty**: Cheating on exams or assignments
58
+ - **Safety-critical systems**: Without human verification (e.g., structural engineering calculations)
59
+ - **Financial advice**: Trading or investment decisions without expert review
60
+ - **Medical calculations**: Drug dosages or medical equipment calibration
61
+ - **Legal matters**: Without professional mathematician/lawyer verification
62
+
63
+ ## Bias, Risks, and Limitations
64
+
65
+ ### Known Limitations
66
+
67
+ **Technical Limitations:**
68
+ - Cannot process visual mathematics (diagrams, geometric figures)
69
+ - May struggle with extremely novel mathematical concepts
70
+ - Limited to training data through October 2024
71
+ - Tool execution can fail for edge cases
72
+ - Performance degrades on extremely complex graduate-level problems
73
+
74
+ **Reasoning Limitations:**
75
+ - May make logical errors in complex proofs
76
+ - Can hallucinate intermediate steps
77
+ - Occasionally produces incorrect final answers
78
+ - May not recognize when a problem has no solution
79
+
80
+ **Computational Limitations:**
81
+ - Cannot perform arbitrarily large calculations without tools
82
+ - Numerical precision limited by underlying libraries
83
+ - May timeout on very long computations
84
+
85
+ ### Risks and Biases
86
+
87
+ **Potential Risks:**
88
+ - Students may become over-reliant on AI assistance
89
+ - Could generate plausible but incorrect mathematical reasoning
90
+ - May perpetuate biases in mathematical education approaches
91
+ - Tool execution could consume excessive computational resources
92
+
93
+ **Mitigation Strategies:**
94
+ - Always verify critical results with human experts
95
+ - Use temperature=0.1 for deterministic mathematical reasoning
96
+ - Enable tool calling for numerical verification
97
+ - Cross-check answers with multiple methods
98
+ - Implement appropriate safeguards in educational settings
99
+
100
+ ## How to Get Started
101
+
102
+ ### Installation
103
+
104
+ ```bash
105
+ pip install torch transformers accelerate sympy
106
+ ```
107
+
108
+ ### Basic Usage
109
+
110
+ ```python
111
+ from transformers import AutoModelForCausalLM, AutoTokenizer
112
+
113
+ # Load model
114
+ model = AutoModelForCausalLM.from_pretrained(
115
+ "Kirim-ai/Kirim-1-Math",
116
+ torch_dtype="auto",
117
+ device_map="auto",
118
+ trust_remote_code=True
119
+ )
120
+
121
+ tokenizer = AutoTokenizer.from_pretrained(
122
+ "Kirim-ai/Kirim-1-Math",
123
+ trust_remote_code=True
124
+ )
125
+
126
+ # Solve a problem
127
+ messages = [
128
+ {"role": "user", "content": "Solve: x² - 5x + 6 = 0"}
129
+ ]
130
+
131
+ inputs = tokenizer.apply_chat_template(messages, return_tensors="pt")
132
+ outputs = model.generate(inputs, max_new_tokens=2048, temperature=0.1)
133
+ print(tokenizer.decode(outputs[0], skip_special_tokens=True))
134
+ ```
135
+
136
+ ### Using the Inference Script
137
+
138
+ ```bash
139
+ # Interactive mode
140
+ python inference_math.py --interactive
141
+
142
+ # Single problem
143
+ python inference_math.py --problem "Calculate the derivative of x^3 + 2x^2"
144
+
145
+ # With quantization
146
+ python inference_math.py --load_in_4bit --interactive
147
+ ```
148
+
149
+ ## Training Details
150
+
151
+ ### Training Data
152
+
153
+ **Mathematical Corpus (500B tokens):**
154
+ - Mathematical proofs: ProofWiki, Lean, Coq, Isabelle (125B tokens)
155
+ - Olympiad problems: IMO, USAMO, AMC, AIME, Putnam (150B tokens)
156
+ - arXiv papers: math.AC, math.AG, math.NT, math.CO (100B tokens)
157
+ - Textbooks: undergraduate to graduate level (75B tokens)
158
+ - Q&A: Math StackExchange, MathOverflow (50B tokens)
159
+
160
+ **Code Corpus (200B tokens):**
161
+ - Mathematical Python libraries (NumPy, SymPy, SciPy)
162
+ - Computational notebooks from Kaggle, GitHub
163
+ - Algorithm implementations
164
+
165
+ **General Corpus (800B tokens):**
166
+ - From Kirim-V1-base pre-training
167
+
168
+ **Total: 1.5 Trillion tokens**
169
+
170
+ ### Training Procedure
171
+
172
+ #### Stage 1: Model Expansion (15 days)
173
+ - Expanded from 13B to 30B parameters
174
+ - Progressive width and depth scaling
175
+ - Hidden size: 4096 → 5120
176
+ - Layers: 32 → 48
177
+
178
+ #### Stage 2: Mathematical Pre-training (30 days)
179
+ - 500B tokens of mathematical content
180
+ - Hardware: 512x NVIDIA H100 80GB
181
+ - Batch size: 2048
182
+ - Learning rate: 1.5e-4 with cosine decay
183
+ - Optimization: AdamW, BF16 precision
184
+
185
+ #### Stage 3: Instruction Tuning (5 days)
186
+ - 200K mathematical instruction-response pairs
187
+ - Balanced across algebra, calculus, geometry, etc.
188
+ - Learning rate: 2e-5
189
+ - 3 epochs
190
+
191
+ #### Stage 4: Tool Calling Training (3 days)
192
+ - 50K tool-calling examples
193
+ - Function definition and execution
194
+ - Error handling and recovery
195
+
196
+ #### Stage 5: Reinforcement Learning (7 days)
197
+ - PPO-based training
198
+ - Reward based on solution correctness
199
+ - Symbolic and numerical verification
200
+
201
+ #### Training Hyperparameters
202
+
203
+ - **Optimizer:** AdamW
204
+ - **Learning rate:** 1.5e-4 (pre-training), 2e-5 (fine-tuning)
205
+ - **Weight decay:** 0.1
206
+ - **Warmup steps:** 2000
207
+ - **Gradient clipping:** 1.0
208
+ - **Precision:** BFloat16
209
+ - **Total GPU hours:** 30,720
210
+ - **Estimated cost:** $450,000 USD
211
+
212
+ ### Compute Infrastructure
213
+
214
+ - **Pre-training:** 512x NVIDIA H100 80GB GPUs
215
+ - **Fine-tuning:** 128x NVIDIA H100 80GB GPUs
216
+ - **Framework:** PyTorch 2.1, DeepSpeed ZeRO-3
217
+ - **Parallelism:** Tensor (8-way), Pipeline (4-way), Data (16-way)
218
+
219
+ ## Evaluation
220
+
221
+ ### Mathematical Reasoning
222
+
223
+ | Benchmark | Score | Comparison |
224
+ |-----------|-------|------------|
225
+ | GSM8K | 94.2% | GPT-4: 92.0% |
226
+ | MATH | 78.5% | GPT-4: 76.4% |
227
+ | MMLU-Math | 88.7% | GPT-4: 86.9% |
228
+ | AMC10/12 | 72.3% | Human avg: 45% |
229
+ | AIME | 38.7% | Human qualifier: 40% |
230
+
231
+ ### Tool Calling
232
+
233
+ | Metric | Score |
234
+ |--------|-------|
235
+ | Tool Selection | 96.8% |
236
+ | Parameter Extraction | 94.2% |
237
+ | Execution Success | 92.5% |
238
+ | Result Integration | 95.1% |
239
+
240
+ ### Code Generation
241
+
242
+ | Task | Pass@1 | Pass@10 |
243
+ |------|--------|---------|
244
+ | HumanEval-Math | 78.3% | 92.1% |
245
+ | SymPy Tasks | 82.5% | 94.7% |
246
+ | NumPy Tasks | 75.6% | 89.3% |
247
+
248
+ ### Performance
249
+
250
+ - **Inference Speed:** 45 tokens/second (A100 80GB)
251
+ - **Memory:** 60GB (BF16), 30GB (INT8), 20GB (INT4)
252
+ - **Latency:** 89ms mean, 145ms p95
253
+
254
+ ## Environmental Impact
255
+
256
+ - **Hardware:** NVIDIA H100 GPUs
257
+ - **Training Time:** 60 days (30,720 GPU hours)
258
+ - **Estimated CO₂:** ~8,500 kg CO₂eq
259
+ - **Power Consumption:** ~850 MWh
260
+
261
+ We are committed to reducing environmental impact through efficient training and model optimization.
262
+
263
+ ## Technical Specifications
264
+
265
+ ### Model Architecture
266
+
267
+ | Parameter | Value |
268
+ |-----------|-------|
269
+ | Parameters | 30B |
270
+ | Hidden Size | 5,120 |
271
+ | Layers | 48 |
272
+ | Attention Heads | 40 |
273
+ | KV Heads | 8 (GQA) |
274
+ | Intermediate Size | 13,824 |
275
+ | Vocabulary | 102,400 |
276
+ | Context Length | 32,768 |
277
+ | Position Encoding | RoPE with YaRN |
278
+ | Activation | SiLU |
279
+ | Normalization | RMSNorm |
280
+
281
+ ### Special Features
282
+
283
+ - **Tool Calling:** JSON-based function calling
284
+ - **Symbolic Solver:** SymPy integration
285
+ - **Code Execution:** Sandboxed Python runtime
286
+ - **LaTeX Formatting:** Automatic equation formatting
287
+
288
+ ## Citation
289
+
290
+ ```bibtex
291
+ @misc{kirim2025math,
292
+ title={Kirim-1-Math: Advanced Mathematical Reasoning with Tool Calling},
293
+ author={Qiling Research},
294
+ year={2025},
295
+ publisher={Kirim AI},
296
+ url={https://huggingface.co/Kirim-ai/Kirim-1-Math}
297
+ }
298
+ ```
299
+
300
+ ## Model Card Authors
301
+
302
+ Qiling Research
303
+
304
+ ## Ethical Considerations
305
+
306
+ ### Educational Impact
307
+
308
+ - May affect traditional mathematics education
309
+ - Could reduce development of mental math skills
310
+ - Should be used as a learning aid, not replacement
311
+
312
+ ### Accessibility
313
+
314
+ - Makes advanced mathematics more accessible
315
+ - Could democratize STEM education
316
+ - May widen gap if access is unequal
317
+
318
+ ### Verification
319
+
320
+ - Always verify results for critical applications
321
+ - Use multiple methods for important calculations
322
+ - Maintain human oversight in education
323
+
324
+ ## Glossary
325
+
326
+ - **Tool Calling:** Ability to invoke external functions for computation
327
+ - **Symbolic Solver:** Algebraic manipulation system (SymPy)
328
+ - **GQA:** Grouped Query Attention for efficiency
329
+ - **RoPE:** Rotary Position Embedding
330
+ - **YaRN:** Yet another RoPE extension method