Reza2kn commited on
Commit
bfd356f
·
verified ·
1 Parent(s): f3646e0

Upload validated CoreML 4-bit MiniCPM5-1B

Browse files
MiniCPM5-1B-CoreML-4bit.mlpackage/Data/com.apple.CoreML/model.mlmodel ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f7fce44fabfa305884cf4b33fe98e7e6b5021ebaf85e9dd89de5f8ddc36b0a5
3
+ size 440798
MiniCPM5-1B-CoreML-4bit.mlpackage/Data/com.apple.CoreML/weights/weight.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:fa1039d7159073e277f0b9ac46234f70e9afa2f124df3b343512224d9fc9afdc
3
+ size 608400256
MiniCPM5-1B-CoreML-4bit.mlpackage/Manifest.json ADDED
@@ -0,0 +1,18 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "fileFormatVersion": "1.0.0",
3
+ "itemInfoEntries": {
4
+ "930EC706-A3BC-441D-90A6-12E95A85C314": {
5
+ "author": "com.apple.CoreML",
6
+ "description": "CoreML Model Specification",
7
+ "name": "model.mlmodel",
8
+ "path": "com.apple.CoreML/model.mlmodel"
9
+ },
10
+ "C116271F-A8AB-4B5C-AFCD-366035D47291": {
11
+ "author": "com.apple.CoreML",
12
+ "description": "CoreML Model Weights",
13
+ "name": "weights",
14
+ "path": "com.apple.CoreML/weights"
15
+ }
16
+ },
17
+ "rootModelIdentifier": "930EC706-A3BC-441D-90A6-12E95A85C314"
18
+ }
README.md ADDED
@@ -0,0 +1,351 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ library_name: transformers
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - minicpm
10
+ - minicpm5
11
+ - llama
12
+ - text-generation
13
+ - long-context
14
+ - tool-calling
15
+ - on-device
16
+ - edge-ai
17
+ datasets:
18
+ - openbmb/Ultra-FineWeb
19
+ - openbmb/Ultra-FineWeb-L3
20
+ - openbmb/UltraData-Math
21
+ - openbmb/UltraData-SFT-2605
22
+ ---
23
+
24
+ <div align="center">
25
+ <img src="https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm_logo.png" width="500em" />
26
+ </div>
27
+
28
+ <p align="center">
29
+ <a href="https://arxiv.org/pdf/2506.07900" target="_blank">MiniCPM Tech Report</a> |
30
+ <a href="https://github.com/OpenBMB/MiniCPM" target="_blank">GitHub Repo</a> |
31
+ <a href="https://ultradata.openbmb.cn/" target="_blank">UltraData</a> |
32
+ <a href="https://github.com/OpenBMB/MiniCPM-Desk-Pet" target="_blank">MiniCPM Desk Pet</a> |
33
+ <a href="https://huggingface.co/spaces/openbmb/MiniCPM5-1B-Demo" target="_blank">Online Demo</a>
34
+ </p>
35
+
36
+ <p align="center">
37
+ English |
38
+ <a href="https://huggingface.co/openbmb/MiniCPM5-1B/blob/main/README-cn.md" target="_blank">中文</a>
39
+ </p>
40
+
41
+ ## Highlights
42
+
43
+ We are releasing **MiniCPM5-1B**, the first model in the **MiniCPM5** series. It is a dense 1B Transformer built for on-device, local deployment, and resource-constrained scenarios, reaching 1B-class open-source SOTA.
44
+
45
+ 🏆 **1B-class open-source SOTA**: compared with strong open-source models in the same size class, MiniCPM5-1B reaches SOTA within this comparison set. Its advantage is most visible in agentic tool use, code generation, and difficult reasoning.
46
+
47
+ ![MiniCPM5-1B capability comparison by domain](https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm5/public_leaderboard_radar_en.png)
48
+
49
+ 🧠 **Hybrid Reasoning**: built-in `<think>` chat template, switch via `enable_thinking`. The same checkpoint serves as both a fast assistant and a deliberate reasoner.
50
+
51
+ 🛠️ **Deployment / Fine-tuning Resources**: the MiniCPM GitHub repo provides single-page cookbooks and Agent Skills for major inference backends and fine-tuning frameworks.
52
+
53
+ 🐱 **Desktop Pet**: a local-LLM desktop pet driven by MiniCPM5-1B.
54
+
55
+ ## Model List
56
+
57
+ Use this directory to choose the model format that matches your runtime:
58
+
59
+ - **[MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B)** · [ModelScope](https://www.modelscope.cn/models/OpenBMB/MiniCPM5-1B) · BF16 final release (post-trained with RL + OPD) **👈 you are here**
60
+ - **[MiniCPM5-1B-SFT](https://huggingface.co/openbmb/MiniCPM5-1B-SFT)** · [ModelScope](https://www.modelscope.cn/models/OpenBMB/MiniCPM5-1B-SFT) · BF16 SFT-only checkpoint (before RL / OPD)
61
+ - **[MiniCPM5-1B-Base](https://huggingface.co/openbmb/MiniCPM5-1B-Base)** · [ModelScope](https://www.modelscope.cn/models/OpenBMB/MiniCPM5-1B-Base) · BF16 base checkpoint (pre-training only)
62
+ - **[MiniCPM5-1B-GGUF](https://huggingface.co/openbmb/MiniCPM5-1B-GGUF)** · [ModelScope](https://www.modelscope.cn/models/OpenBMB/MiniCPM5-1B-GGUF) · GGUF for llama.cpp / Ollama / LM Studio
63
+ - **[MiniCPM5-1B-MLX](https://huggingface.co/openbmb/MiniCPM5-1B-MLX)** · [ModelScope](https://www.modelscope.cn/models/OpenBMB/MiniCPM5-1B-MLX) · MLX / 4bit for Apple Silicon
64
+
65
+ ## Model Information
66
+
67
+ MiniCPM5-1B has the following features:
68
+
69
+ - **Type**: Causal Language Model
70
+ - **Architecture**: Standard `LlamaForCausalLM`
71
+ - **Number of Parameters**: 1,080,632,832
72
+ - **Number of Non-Embedding Parameters**: 679,552,512
73
+ - **Number of Layers**: 24
74
+ - **Number of Attention Heads (GQA)**: 16 for Q and 2 for KV
75
+ - **Context Length**: 131,072
76
+
77
+ ## Introduction
78
+
79
+ MiniCPM5-1B is the first checkpoint in the MiniCPM5 series. It is designed for local assistants, coding agents, tool-use workflows, and reasoning scenarios where a compact model is preferred. The model keeps a small deployment footprint while providing native long-context support and both Think / No Think chat modes through the same checkpoint.
80
+
81
+ ## Evaluation Results
82
+
83
+ We compare MiniCPM5-1B with strong open-source models in the same size class, including **LFM2.5-1.2B-Thinking**, **Qwen3-0.6B/think** and **Qwen3.5-0.8B/think**. These are capable baselines; within this comparison set, MiniCPM5-1B reaches 1B-class open-source SOTA, with its advantage most visible in tool use, code generation, and difficult reasoning. This makes it a practical choice for local coding agents, tool assistants, and reasoning assistants.
84
+
85
+ ![MiniCPM-5 1B Public Leaderboard](https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm5/public_leaderboard_en.png)
86
+
87
+ ## Training Recipe
88
+
89
+ The training of MiniCPM5-1B is a full-stack practice of **[UltraData Tiered Data Management](https://arxiv.org/pdf/2602.09003)**, covering three stages: base training, mid-training, and post-training.
90
+
91
+ During **base training**, the model goes through stable training and decay training to build core language capability and training stability. It then enters **mid-training** to further strengthen target capabilities and adapt to the target data distribution. The training corpus is released alongside the model as [Ultra-FineWeb](https://huggingface.co/datasets/openbmb/Ultra-FineWeb), [Ultra-FineWeb-L3](https://huggingface.co/datasets/openbmb/Ultra-FineWeb-L3), and [UltraData-Math](https://huggingface.co/datasets/openbmb/UltraData-Math).
92
+
93
+ During **post-training**, we proceed in three steps: **SFT**, **RL**, and **OPD**. We first use **200B tokens of deep-thinking SFT** and **200B tokens of hybrid-thinking SFT** to establish deep-thinking, hybrid-thinking, and general chat abilities; the SFT data is released as [UltraData-SFT-2605](https://huggingface.co/datasets/openbmb/UltraData-SFT-2605). We then train specialized **RL teachers** for math, code, closed-book QA, writing, and related domains, and use **On-Policy Distillation (OPD)** to distill these teachers back into one release model.
94
+
95
+ ![MiniCPM5-1B Training Recipe](https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm5/training_recipe.png)
96
+
97
+ ### What does RL + OPD bring?
98
+
99
+ **RL + OPD** is a key part of MiniCPM5-1B post-training. On math, code and instruction-following tasks, RL + OPD raises the average score by **↑16 points** while cutting the share of responses that hit the max-tokens budget by **↓29 percentage points**. The figures below show the two-stage Reasoning RL pipeline, score gains, and the drop in overlong responses.
100
+
101
+ **RL** combines complementary training signals for reasoning, closed-book QA, writing, instruction following, long-context understanding, and general dialogue. Reasoning RL is based on [DAPO-Math-17k](https://huggingface.co/datasets/BytedTsinghua-SIA/DAPO-Math-17k) (inspired by [JustRL](https://arxiv.org/pdf/2512.16649)'s minimalist recipe) and uses a two-stage length schedule to reduce overlong responses while improving reasoning accuracy. We also use [TriviaQA](https://huggingface.co/datasets/mandarjoshi/trivia_qa), [NQ-Open](https://huggingface.co/datasets/google-research-datasets/nq_open), [LongWriter-Zero-RLData](https://huggingface.co/datasets/THU-KEG/LongWriter-Zero-RLData), synthesized verifiable RLVR data, and pair-wise RLHF signals to improve reliability, instruction following, and user experience.
102
+
103
+ ![MiniCPM5-1B RL Two-stage Pipeline](https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm5/rl_two_stage_overview.png)
104
+
105
+ **OPD** builds on Thinking Machines Lab's [On-Policy Distillation](https://thinkingmachines.ai/blog/on-policy-distillation/) and incorporates implementation improvements from [Rethinking On-Policy Distillation](https://arxiv.org/pdf/2604.13016). In the RL framework, we use reverse KL divergence as the advantage estimate, replacing the original verification-based advantage. At each response position, we take top-k logits from both the student and teacher models, compute reverse KL on the union of the two token sets, and balance the accuracy of the RKL signal with training efficiency. OPD reuses the in-domain prompts used to train each RL teacher as distillation data, so no additional data curation is required.
106
+
107
+ ![MiniCPM5-1B RL + OPD Gains](https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm5/rl_gains.png)
108
+
109
+ ![MiniCPM5-1B RL + OPD Overlong Response Rate Drop](https://raw.githubusercontent.com/OpenBMB/MiniCPM/main/assets/minicpm5/rl_overlong.png)
110
+
111
+ ## Quickstart
112
+
113
+ ### vLLM
114
+
115
+ ```bash
116
+ pip install "vllm>=0.21"
117
+ vllm serve openbmb/MiniCPM5-1B --port 8000
118
+ ```
119
+
120
+ ```bash
121
+ curl http://localhost:8000/v1/chat/completions \
122
+ -H "Content-Type: application/json" \
123
+ -d '{
124
+ "model": "openbmb/MiniCPM5-1B",
125
+ "messages": [{"role": "user", "content": "Who are you? Please briefly introduce yourself."}],
126
+ "max_tokens": 128,
127
+ "temperature": 0.7
128
+ }'
129
+ ```
130
+
131
+ ### SGLang
132
+
133
+ ```bash
134
+ pip install "sglang[srt]>=0.5.12"
135
+ python -m sglang.launch_server --model-path openbmb/MiniCPM5-1B --port 30000
136
+ ```
137
+
138
+ ```bash
139
+ curl http://localhost:30000/v1/chat/completions \
140
+ -H "Content-Type: application/json" \
141
+ -d '{
142
+ "model": "openbmb/MiniCPM5-1B",
143
+ "messages": [{"role": "user", "content": "Who are you? Please briefly introduce yourself."}],
144
+ "max_tokens": 128,
145
+ "temperature": 0.7
146
+ }'
147
+ ```
148
+
149
+ ### Transformers
150
+
151
+ ```bash
152
+ pip install -U "transformers>=5.6" accelerate torch
153
+ ```
154
+
155
+ ```python
156
+ from transformers import AutoModelForCausalLM, AutoTokenizer
157
+
158
+ model_id = "openbmb/MiniCPM5-1B"
159
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
160
+ model = AutoModelForCausalLM.from_pretrained(
161
+ model_id,
162
+ torch_dtype="auto",
163
+ device_map="auto",
164
+ )
165
+
166
+ messages = [{"role": "user", "content": "Who are you? Please briefly introduce yourself."}]
167
+ inputs = tokenizer.apply_chat_template(
168
+ messages,
169
+ tokenize=True,
170
+ add_generation_prompt=True,
171
+ enable_thinking=False,
172
+ return_dict=True,
173
+ return_tensors="pt",
174
+ ).to(model.device)
175
+
176
+ outputs = model.generate(**inputs, max_new_tokens=128)
177
+ print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:], skip_special_tokens=True))
178
+ ```
179
+
180
+ Recommended chat template sampling:
181
+
182
+ | Mode | Recommended params | Enable |
183
+ | --- | --- | --- |
184
+ | **Think** | `temperature=0.9, top_p=0.95` | `enable_thinking=True` |
185
+ | **No Think** | `temperature=0.7, top_p=0.95` | `enable_thinking=False` |
186
+
187
+ ## Tool Calling
188
+
189
+ For tool / function calling, **SGLang is the recommended backend**. MiniCPM5-1B emits XML-style tool calls and SGLang's built-in `minicpm5` parser converts them to OpenAI-compatible `tool_calls` natively:
190
+
191
+ ```bash
192
+ python -m sglang.launch_server --model-path openbmb/MiniCPM5-1B --port 30000 \
193
+ --tool-call-parser minicpm5 # or: --tool-call-parser auto
194
+ ```
195
+
196
+ ## GitHub Cookbooks and Agent Skills
197
+
198
+ MiniCPM5-1B uses the **standard `LlamaForCausalLM` architecture**, so mainstream inference engines can load it directly: **no custom kernels, no model-code fork**. For step-by-step deployment and fine-tuning instructions, use the GitHub cookbooks below. Agent Skills are linked as GitHub resources for users working with Cursor / Claude Code style coding agents.
199
+
200
+ ### Deployment
201
+
202
+ | Backend | Model format / use case | Cookbook | Agent Skill |
203
+ | --- | --- | --- | --- |
204
+ | Transformers | BF16 / FP16 local Python inference, GPU + CPU | [transformers.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/transformers.md) | [minicpm5-deploy-transformers](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-transformers/SKILL.md) |
205
+ | vLLM | BF16 / FP16 OpenAI server | [vllm.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/vllm.md) | [minicpm5-deploy-vllm](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-vllm/SKILL.md) |
206
+ | SGLang | BF16 / FP16 OpenAI server, recommended for tool calling | [sglang.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/sglang.md) | [minicpm5-deploy-sglang](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-sglang/SKILL.md) |
207
+ | llama.cpp | GGUF local inference, CPU/GPU | [llama_cpp.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/llama_cpp.md) | [minicpm5-deploy-llama-cpp](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-llama-cpp/SKILL.md) |
208
+ | Ollama | GGUF local on-device runtime | [ollama.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/ollama.md) | [minicpm5-deploy-ollama](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-ollama/SKILL.md) |
209
+ | LM Studio | GGUF Mac desktop app and OpenAI server | [lmstudio.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/lmstudio.md) | [minicpm5-deploy-lmstudio](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-lmstudio/SKILL.md) |
210
+ | MLX | MLX / 4bit local inference on Apple Silicon | [mlx.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/mlx.md) | [minicpm5-deploy-mlx](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-mlx/SKILL.md) |
211
+ | ArcLight | GGUF local on-device, CPU, Desktop & Server | [arclight.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/deployment/arclight.md) | [minicpm5-deploy-arclight](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-deploy-arclight/SKILL.md) |
212
+
213
+ ### Fine-tuning
214
+
215
+ | Framework | Use case | Cookbook | Agent Skill |
216
+ | --- | --- | --- | --- |
217
+ | TRL + PEFT | LoRA / SFT fine-tuning | [trl.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/trl.md) | [minicpm5-finetune-trl](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-trl/SKILL.md) |
218
+ | LLaMA-Factory | Fine-tuning | [llamafactory.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/llamafactory.md) | [minicpm5-finetune-llamafactory](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-llamafactory/SKILL.md) |
219
+ | ms-swift | Fine-tuning | [ms_swift.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/ms_swift.md) | [minicpm5-finetune-ms-swift](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-ms-swift/SKILL.md) |
220
+ | unsloth | Fine-tuning | [unsloth.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/unsloth.md) | [minicpm5-finetune-unsloth](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-unsloth/SKILL.md) |
221
+ | xtuner | Fine-tuning | [xtuner.md](https://github.com/OpenBMB/MiniCPM/blob/main/docs/finetune/xtuner.md) | [minicpm5-finetune-xtuner](https://github.com/OpenBMB/MiniCPM/blob/main/skills/minicpm5-finetune-xtuner/SKILL.md) |
222
+
223
+ ### Other Supported Frameworks
224
+
225
+ In addition to the deployment and fine-tuning frameworks listed above, MiniCPM5-1B is also supported by FlagOS for multi-chip deployment.
226
+
227
+ #### FlagOS Overview
228
+
229
+ To enable large-scale deployment across different AI chips, Beijing Zhiyuan Research Institute, together with numerous research institutions, chip manufacturers, system vendors, and algorithm and software organizations both domestically and internationally, jointly initiated and established the FlagOS Open Source Community.
230
+
231
+ The FlagOS community is dedicated to building a unified, open-source system software stack for various AI chips, encompassing core open-source projects such as a large-scale operator library, a unified AI compiler, parallel training and inference frameworks, and a unified communication library. It aims to create an open technology ecosystem connecting the “model-system-chip” layers. By enabling “develop once, deploy across chips”, FlagOS unlocks the computational potential of hardware, breaks down the ecosystem silos between different chip software stacks, and effectively reduces migration costs for developers.The FlagOS community fosters an AI hardware and software ecosystem, overcomes single-vendor closed-source monopolies, promotes widespread deployment of AI hardware technologies, and is committed to rooted in China while embracing global collaboration.
232
+
233
+ Official website express: [https://flagos.io](https://flagos.io/)
234
+
235
+ <details>
236
+ <summary>FlagOS multi-chip support and usage</summary>
237
+
238
+ #### FlagOS: Supporting Multiple AI Chips
239
+
240
+ Thanks to FlagOS’s unified multi-chip AI system software stack, MiniCPM5-1B was adapted to 4–5 different AI chips in an extremely short time. Currently, the multi-chip version of MiniCPM5-1B has been released on FlagRelease, FlagOS’s platform for automatic migration, adaptation, and deployment of large models across multi-architecture AI chips. Details are as follows:
241
+
242
+ |Vendor|ModelScope|Huggingface|
243
+ |---|---|---|
244
+ |Nvidia|[MiniCPM5-1B-nvidia-FlagOS](https://www.modelscope.cn/models/FlagRelease/MiniCPM5-1B-nvidia-FlagOS)|[MiniCPM5-1B-nvidia-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-nvidia-FlagOS)|
245
+ |Hygon|[MiniCPM5-1B-hygon-FlagOS](https://www.modelscope.cn/models/FlagRelease/MiniCPM5-1B-hygon-FlagOS)|[MiniCPM5-1B-hygon-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-hygon-FlagOS)|
246
+ |Metax|[MiniCPM5-1B-metax-FlagOS](https://www.modelscope.cn/models/FlagRelease/MiniCPM5-1B-metax-FlagOS)|[MiniCPM5-1B-metax-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-metax-FlagOS)|
247
+ |Iluvatar|[MiniCPM5-1B-iluvatar-FlagOS](https://www.modelscope.cn/models/FlagRelease/MiniCPM5-1B-iluvatar-FlagOS)|[MiniCPM5-1B-iluvatar-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-iluvatar-FlagOS)|
248
+ |Zhenwu|[MiniCPM5-1B-zhenwu-FlagOS](https://www.modelscope.cn/models/FlagRelease/MiniCPM5-1B-zhenwu-FlagOS)|[MiniCPM5-1B-zhenwu-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-zhenwu-FlagOS)|
249
+ |Mthreads|[MiniCPM5-1B-mthreads-FlagOS](https://www.modelscope.cn/models/FlagRelease/MiniCPM5-1B-mthreads-FlagOS)|[MiniCPM5-1B-mthreads-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-mthreads-FlagOS)|
250
+ |Kunlunxin|[MiniCPM5-1B-kunlunxin-FlagOS](https://www.modelscope.cn/models/FlagRelease/MiniCPM5-1B-kunlunxin-FlagOS)|[MiniCPM5-1B-kunlunxin-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-kunlunxin-FlagOS)|
251
+ |Ascend|[MiniCPM5-1B-ascend-FlagOS](https://modelscope.cn/models/FlagRelease/MiniCPM5-1B-ascend-FlagOS)|[MiniCPM5-1B-ascend-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-ascend-FlagOS)|
252
+ |ARM-v9|[MiniCPM5-1B-Armv9-FlagOS](https://modelscope.cn/models/FlagRelease/MiniCPM5-1B-Armv9-FlagOS)|[MiniCPM5-1B-Armv9-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-Armv9-FlagOS)|
253
+
254
+ #### FlagOS Usage
255
+
256
+ ##### FlagOS Performance Acceleration on Nvidia
257
+
258
+ ###### From FlagRelease (**Recommendation**)
259
+
260
+ FlagRelease is a platform developed by the FlagOS team for automatic migration, adaptation, and deployment of large models across multi-architecture AI chips. The multi-chip version of MiniCPM5-1B has already been released on FlagRelease. All necessary software packages are pre-installed on the platform, so users do not need to install anything.
261
+
262
+ ###### FlagRelease Image Key Versions
263
+
264
+ ###### FlagRelease Quick Start
265
+
266
+ |Vendor|ModelScope|Huggingface|
267
+ |---|---|---|
268
+ |Nvidia|[MiniCPM5-1B-nvidia-FlagOS](https://www.modelscope.cn/models/FlagRelease/MiniCPM5-1B-nvidia-FlagOS)|[MiniCPM5-1B-nvidia-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-nvidia-FlagOS)|
269
+ |Hygon|[MiniCPM5-1B-hygon-FlagOS](https://www.modelscope.cn/models/FlagRelease/MiniCPM5-1B-hygon-FlagOS)|[MiniCPM5-1B-hygon-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-hygon-FlagOS)|
270
+ |Metax|[MiniCPM5-1B-metax-FlagOS](https://www.modelscope.cn/models/FlagRelease/MiniCPM5-1B-metax-FlagOS)|[MiniCPM5-1B-metax-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-metax-FlagOS)|
271
+ |Iluvatar|[MiniCPM5-1B-iluvatar-FlagOS](https://www.modelscope.cn/models/FlagRelease/MiniCPM5-1B-iluvatar-FlagOS)|[MiniCPM5-1B-iluvatar-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-iluvatar-FlagOS)|
272
+ |Zhenwu|[MiniCPM5-1B-zhenwu-FlagOS](https://www.modelscope.cn/models/FlagRelease/MiniCPM5-1B-zhenwu-FlagOS)|[MiniCPM5-1B-zhenwu-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-zhenwu-FlagOS)|
273
+ |Mthreads|[MiniCPM5-1B-mthreads-FlagOS](https://www.modelscope.cn/models/FlagRelease/MiniCPM5-1B-mthreads-FlagOS)|[MiniCPM5-1B-mthreads-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-mthreads-FlagOS)|
274
+ |Kunlunxin|[MiniCPM5-1B-kunlunxin-FlagOS](https://www.modelscope.cn/models/FlagRelease/MiniCPM5-1B-kunlunxin-FlagOS)|[MiniCPM5-1B-kunlunxin-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-kunlunxin-FlagOS)|
275
+ |Ascend|[MiniCPM5-1B-ascend-FlagOS](https://modelscope.cn/models/FlagRelease/MiniCPM5-1B-ascend-FlagOS)|[MiniCPM5-1B-ascend-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-ascend-FlagOS)|
276
+ |ARM-v9|[MiniCPM5-1B-Armv9-FlagOS](https://modelscope.cn/models/FlagRelease/MiniCPM5-1B-Armv9-FlagOS)|[MiniCPM5-1B-Armv9-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-Armv9-FlagOS)|
277
+
278
+ ###### From Scratch
279
+
280
+ - Dependencies: Python 3.12, GLIBC 2.39, GLIBCXX 3.4.33, CXXABI 1.3.15
281
+
282
+ ###### Vllm Version
283
+
284
+ ###### Installing the FlagOS Operator Library
285
+
286
+ Official Repository: https://github.com/flagos-ai/FlagGems
287
+
288
+ ```PowerShell
289
+ pip install flag-gems==4.2.1rc0
290
+ pip install triton==3.5.1
291
+ ```
292
+
293
+ ###### Activating Acceleration
294
+
295
+ You can enable flagGems acceleration by adding the import of flagGems in the source code of vllm where inference is performed.
296
+
297
+ ```Bash
298
+ import flag_gems
299
+ flag_gems.enable(record=True, once=True, path="/root/gems.txt")
300
+ ```
301
+
302
+ ```PowerShell
303
+ vllm serve ${model_path} \
304
+ --trust-remote-code \
305
+ --dtype bfloat16 \
306
+ --enforce-eager \
307
+ --port ${Port} \
308
+ --served-model-name ${model_name} \
309
+ --gpu-memory-utilization 0.85
310
+ ```
311
+
312
+ ##### Using FlagOS Unified Multi-Chip Backend Plugin
313
+
314
+ [**vllm-plugin-FL**](https://github.com/flagos-ai/vllm-plugin-FL) is a plugin built for the vLLM inference/service framework. Developed on top of FlagOS’s unified multi-chip backend, it is designed to extend vLLM’s capabilities and performance across a variety of hardware environments.
315
+
316
+ ###### Using vllm-plugin-FL
317
+
318
+ |Vendor|From Scratch|From FlagRelease||
319
+ |---|---|---|---|
320
+ |Nvidia|[vllm-plugin-FL/MiniCPM5-1B](https://github.com/flagos-ai/vllm-plugin-FL/blob/main/examples/minicpm/README.md)|[MiniCPM5-1B-ModelScope](https://www.modelscope.cn/models/FlagRelease/MiniCPM5-1B-nvidia-FlagOS)|[MiniCPM5-1B-nvidia-FlagOS](https://huggingface.co/FlagRelease/MiniCPM5-1B-nvidia-FlagOS)|
321
+
322
+ </details>
323
+
324
+ ## Desktop Pet
325
+
326
+ We also ship **[OpenBMB/MiniCPM-Desk-Pet](https://github.com/OpenBMB/MiniCPM-Desk-Pet)**, a desktop pet driven locally by MiniCPM5-1B. It supports Apple Silicon / NVIDIA GPU / CPU paths, can work with coding agents such as Cursor, Claude Code, and Codex, and supports LoRA persona switching.
327
+
328
+ <a href="https://youtu.be/Ee0slMW8SEk"><img src="https://img.youtube.com/vi/Ee0slMW8SEk/0.jpg" alt="MiniCPM Desk Pet video demo" width="720"></a>
329
+
330
+ ## Limitations and Responsible Use
331
+
332
+ MiniCPM5-1B is a language model that generates content based on learned statistical patterns from training data. It may produce inaccurate, biased, or unsafe outputs, and generated content should be reviewed and verified before use in high-stakes settings.
333
+
334
+ Users are responsible for evaluating outputs, applying appropriate safeguards, and complying with applicable laws, regulations, and platform policies.
335
+
336
+ ## License
337
+
338
+ This repository and MiniCPM model weights are released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License.
339
+
340
+ ## Citation
341
+
342
+ Please cite our paper if you find our work valuable:
343
+
344
+ ```bibtex
345
+ @article{minicpm4,
346
+ title={Minicpm4: Ultra-efficient llms on end devices},
347
+ author={MiniCPM, Team},
348
+ journal={arXiv preprint arXiv:2506.07900},
349
+ year={2025}
350
+ }
351
+ ```
chat_template.jinja ADDED
@@ -0,0 +1,179 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {{- bos_token }}{%- if tools %}
2
+ {%- set tool_definitions %}
3
+ {{- "# Tools\n\nYou are provided with function signatures within <tools></tools> XML tags:\n<tools>" }}
4
+ {%- for tool in tools %}
5
+ {{- "\n" }}
6
+ {{- tool | tojson(ensure_ascii=False) }}
7
+ {%- endfor %}
8
+ {{- '\n</tools>\n\nTool usage guidelines:\n- You may call zero or more functions. If no function calls are needed, just answer normally and do not include any <function ... </function>.\n- When calling a function, return an XML object within <function ... </function> using:\n<function name="function-name"><param name="param-name">param-value</param></function>\n- param-value may be multi-line. If it contains <, & or newline characters, wrap it in a CDATA block: <param name="param-name"><![CDATA[...multi-line value...]]></param>' }}
9
+ {%- endset %}
10
+
11
+ {{- '<|im_start|>system\n' }}
12
+ {%- if messages[0].role == 'system' %}
13
+ {%- if '<tool_def_sep>' in messages[0].content %}
14
+ {{- messages[0].content.replace('<tool_def_sep>', tool_definitions) }}
15
+ {%- else %}
16
+ {{- messages[0].content + '\n\n' + tool_definitions }}
17
+ {%- endif %}
18
+ {%- else %}
19
+ {{- tool_definitions.lstrip() }}
20
+ {%- endif %}
21
+ {{- '<|im_end|>\n' }}
22
+ {%- else %}
23
+ {%- if messages[0].role == 'system' %}
24
+ {{- '<|im_start|>system\n' + messages[0].content + '<|im_end|>\n' }}
25
+ {%- endif %}
26
+ {%- endif %}
27
+ {%- set ns = namespace(multi_step_tool=true, last_query_index=messages|length - 1) %}
28
+ {%- for message in messages[::-1] %}
29
+ {%- set index = (messages|length - 1) - loop.index0 %}
30
+ {%- if ns.multi_step_tool and message.role == "user" and message.content is string and not(message.content.startswith('<tool_response>') and message.content.endswith('</tool_response>')) %}
31
+ {%- set ns.multi_step_tool = false %}
32
+ {%- set ns.last_query_index = index %}
33
+ {%- endif %}
34
+ {%- endfor %}
35
+ {%- for message in messages %}
36
+ {%- if message.content is string %}
37
+ {%- set content = message.content %}
38
+ {%- else %}
39
+ {%- set content = '' %}
40
+ {%- endif %}
41
+ {%- if (message.role == "user") or (message.role == "system" and not loop.first) %}
42
+ {{- '<|im_start|>' + message.role + '\n' + content + '<|im_end|>' + '\n' }}
43
+ {%- elif message.role == "assistant" %}
44
+ {%- set reasoning_content = '' %}
45
+ {%- if message.reasoning_content is string %}
46
+ {%- set reasoning_content = message.reasoning_content %}
47
+ {%- else %}
48
+ {%- if '</think>' in content %}
49
+ {%- set reasoning_content = content.split('</think>')[0].rstrip('\n').split('<think>')[-1].lstrip('\n') %}
50
+ {%- set content = content.split('</think>')[-1].lstrip('\n') %}
51
+ {%- endif %}
52
+ {%- endif %}
53
+
54
+ {%- if message.tool_calls %}
55
+ {%- set content_parts = content.split('<tool_sep>') %}
56
+ {%- set processed_content = content_parts[0] %}
57
+ {%- set tool_calls_count = message.tool_calls|length %}
58
+ {%- set tool_sep_count = content_parts|length - 1 %}
59
+ {%- set min_count = [tool_calls_count, tool_sep_count]|min %}
60
+
61
+ {%- for i in range(1, content_parts|length) %}
62
+ {%- set tool_index = i - 1 %}
63
+ {%- if tool_index < tool_calls_count %}
64
+ {%- set tool_call = message.tool_calls[tool_index] %}
65
+ {%- if tool_call.function %}
66
+ {%- set tool_call = tool_call.function %}
67
+ {%- endif %}
68
+ {%- set single_tool_xml %}
69
+ {{- '<function name="' ~ tool_call.name ~ '">' }}
70
+ {%- if tool_call.arguments %}
71
+ {%- set args_dict = tool_call.arguments %}
72
+ {%- for param_name, param_value in args_dict.items() %}
73
+ {{- '<param name="' ~ param_name ~ '">' }}
74
+ {%- if param_value is string and ('<' in param_value or '&' in param_value or '\n' in param_value) %}
75
+ {{- '<![CDATA[' + param_value + ']]>' }}
76
+ {%- else %}
77
+ {{- param_value }}
78
+ {%- endif %}
79
+ {{- '</param>' }}
80
+ {%- endfor %}
81
+ {%- endif %}
82
+ {{- '</function>' }}
83
+ {%- endset %}
84
+ {%- set processed_content = processed_content + single_tool_xml + content_parts[i] %}
85
+ {%- else %}
86
+ {%- set processed_content = processed_content + content_parts[i] %}
87
+ {%- endif %}
88
+ {%- endfor %}
89
+
90
+ {%- if tool_calls_count > tool_sep_count %}
91
+ {%- for remaining_index in range(tool_sep_count, tool_calls_count) %}
92
+ {%- set tool_call = message.tool_calls[remaining_index] %}
93
+ {%- if tool_call.function %}
94
+ {%- set tool_call = tool_call.function %}
95
+ {%- endif %}
96
+ {%- set remaining_tool_xml %}
97
+ {{- '<function name="' ~ tool_call.name ~ '">' }}
98
+ {%- if tool_call.arguments %}
99
+ {%- set args_dict = tool_call.arguments %}
100
+ {%- for param_name, param_value in args_dict.items() %}
101
+ {{- '<param name="' ~ param_name ~ '">' }}
102
+ {%- if param_value is string and ('<' in param_value or '&' in param_value or '\n' in param_value) %}
103
+ {{- '<![CDATA[' + param_value + ']]>' }}
104
+ {%- else %}
105
+ {{- param_value }}
106
+ {%- endif %}
107
+ {{- '</param>' }}
108
+ {%- endfor %}
109
+ {%- endif %}
110
+ {{- '</function>' }}
111
+ {%- endset %}
112
+ {%- set processed_content = processed_content + remaining_tool_xml %}
113
+ {%- endfor %}
114
+ {%- endif %}
115
+
116
+ {%- set content = processed_content %}
117
+ {%- endif %}
118
+
119
+ {%- if loop.index0 > ns.last_query_index %}
120
+ {%- if reasoning_content %}
121
+ {{- '<|im_start|>' + message.role + '\n<think>\n' + reasoning_content.strip('\n') + '\n</think>\n\n' + content.lstrip('\n') }}
122
+ {%- else %}
123
+ {{- '<|im_start|>' + message.role + '\n' + content }}
124
+ {%- endif %}
125
+ {%- else %}
126
+ {{- '<|im_start|>' + message.role + '\n' + content }}
127
+ {%- endif %}
128
+
129
+ {%- if message.tool_calls and not has_tool_sep %}
130
+ {%- for tool_call in message.tool_calls %}
131
+ {%- if (loop.first and content) or (not loop.first) %}
132
+ {{- '\n' }}
133
+ {%- endif %}
134
+ {%- if tool_call.function %}
135
+ {%- set tool_call = tool_call.function %}
136
+ {%- endif %}
137
+ {{- '<function name="' ~ tool_call.name ~ '">' }}
138
+ {%- if tool_call.arguments %}
139
+ {%- set args_dict = tool_call.arguments %}
140
+ {%- for param_name, param_value in args_dict.items() %}
141
+ {{- '<param name="' ~ param_name ~ '">' }}
142
+ {%- if param_value is string and ('<' in param_value or '&' in param_value or '\n' in param_value) %}
143
+ {{- '<![CDATA[' + param_value + ']]>' }}
144
+ {%- else %}
145
+ {{- param_value }}
146
+ {%- endif %}
147
+ {{- '</param>' }}
148
+ {%- endfor %}
149
+ {%- endif %}
150
+ {{- '</function>' }}
151
+ {%- endfor %}
152
+ {%- endif %}
153
+ {{- '<|im_end|>\n' }}
154
+ {%- elif message.role == "tool" %}
155
+ {%- if loop.first or (messages[loop.index0 - 1].role != "tool") %}
156
+ {{- '<|im_start|>user' }}
157
+ {%- endif %}
158
+ {{- '\n<tool_response>\n' }}
159
+ {%- if message.content is string %}
160
+ {{- content }}
161
+ {%- else %}
162
+ {{- message.content | tojson(ensure_ascii=False) }}
163
+ {%- endif %}
164
+ {{- '\n</tool_response>' }}
165
+ {%- if loop.last or (messages[loop.index0 + 1].role != "tool") %}
166
+ {{- '<|im_end|>\n' }}
167
+ {%- endif %}
168
+ {%- endif %}
169
+ {%- endfor %}
170
+ {%- if add_generation_prompt %}
171
+ {{- '<|im_start|>assistant\n' }}
172
+ {%- if enable_thinking is defined %}
173
+ {%- if enable_thinking is false %}
174
+ {{- '<think>\n\n</think>\n\n' }}
175
+ {%- elif enable_thinking is true %}
176
+ {{- '<think>\n' }}
177
+ {%- endif %}
178
+ {%- endif %}
179
+ {%- endif %}
config.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "openbmb/MiniCPM5-1B",
3
+ "architectures": [
4
+ "LlamaForCausalLM"
5
+ ],
6
+ "bos_token_id": 0,
7
+ "eos_token_id": [
8
+ 1,
9
+ 130073
10
+ ],
11
+ "pad_token_id": 1,
12
+ "hidden_act": "silu",
13
+ "hidden_size": 1536,
14
+ "initializer_range": 0.02,
15
+ "intermediate_size": 4608,
16
+ "max_position_embeddings": 131072,
17
+ "model_type": "llama",
18
+ "num_attention_heads": 16,
19
+ "num_hidden_layers": 24,
20
+ "num_key_value_heads": 2,
21
+ "head_dim": 128,
22
+ "rms_norm_eps": 1e-06,
23
+ "rope_theta": 5000000,
24
+ "rope_scaling": null,
25
+ "tie_word_embeddings": false,
26
+ "torch_dtype": "bfloat16",
27
+ "transformers_version": "5.6.2",
28
+ "use_cache": true,
29
+ "vocab_size": 130560
30
+ }
coreml_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "source_model": "openbmb/MiniCPM5-1B",
3
+ "context": 128,
4
+ "inputs": [
5
+ "input_ids"
6
+ ],
7
+ "weight_quantization": "int4 linear_symmetric per_block block_size=32",
8
+ "minimum_deployment_target": "iOS18/macOS15 era Core ML runtime"
9
+ }
generation_config.json ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 0,
4
+ "eos_token_id": [
5
+ 1,
6
+ 130073
7
+ ],
8
+ "pad_token_id": 1,
9
+ "do_sample": true,
10
+ "temperature": 0.9,
11
+ "top_p": 0.95,
12
+ "transformers_version": "5.6.2"
13
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,17 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_prefix_space": null,
3
+ "backend": "tokenizers",
4
+ "bos_token": "<s>",
5
+ "clean_up_tokenization_spaces": false,
6
+ "eos_token": "</s>",
7
+ "is_local": true,
8
+ "legacy": true,
9
+ "local_files_only": false,
10
+ "model_max_length": 1000000000000000019884624838656,
11
+ "pad_token": "</s>",
12
+ "sp_model_kwargs": {},
13
+ "spaces_between_special_tokens": false,
14
+ "tokenizer_class": "TokenizersBackend",
15
+ "unk_token": "<unk>",
16
+ "use_default_system_prompt": false
17
+ }