Text Generation
Transformers
Safetensors
English
Chinese
llama
minicpm
minicpm5
long-context
tool-calling
on-device
edge-ai
conversational
text-generation-inference
Instructions to use openbmb/MiniCPM5-1B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openbmb/MiniCPM5-1B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="openbmb/MiniCPM5-1B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("openbmb/MiniCPM5-1B") model = AutoModelForCausalLM.from_pretrained("openbmb/MiniCPM5-1B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use openbmb/MiniCPM5-1B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "openbmb/MiniCPM5-1B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM5-1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/openbmb/MiniCPM5-1B
- SGLang
How to use openbmb/MiniCPM5-1B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "openbmb/MiniCPM5-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM5-1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "openbmb/MiniCPM5-1B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/MiniCPM5-1B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use openbmb/MiniCPM5-1B with Docker Model Runner:
docker model run hf.co/openbmb/MiniCPM5-1B
Add Chinese model card (README-cn.md)
Browse files- README-cn.md +240 -0
README-cn.md
ADDED
|
@@ -0,0 +1,240 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- zh
|
| 5 |
+
- en
|
| 6 |
+
library_name: transformers
|
| 7 |
+
pipeline_tag: text-generation
|
| 8 |
+
tags:
|
| 9 |
+
- minicpm
|
| 10 |
+
- minicpm5
|
| 11 |
+
- llama
|
| 12 |
+
- text-generation
|
| 13 |
+
- long-context
|
| 14 |
+
- tool-calling
|
| 15 |
+
- on-device
|
| 16 |
+
- edge-ai
|
| 17 |
+
datasets:
|
| 18 |
+
- openbmb/Ultra-FineWeb-L3
|
| 19 |
+
- openbmb/UltraData-SFT-2605
|
| 20 |
+
---
|
| 21 |
+
|
| 22 |
+
<div align="center">
|
| 23 |
+
<img src="https://raw.githubusercontent.com/OpenBMB/MiniCPM/minicpm5/assets/minicpm_logo.png" width="500em" />
|
| 24 |
+
</div>
|
| 25 |
+
|
| 26 |
+
<p align="center">
|
| 27 |
+
<a href="https://arxiv.org/pdf/2506.07900" target="_blank">MiniCPM 论文</a> |
|
| 28 |
+
<a href="https://github.com/OpenBMB/MiniCPM/tree/minicpm5" target="_blank">GitHub 仓库</a> |
|
| 29 |
+
<a href="https://huggingface.co/openbmb/MiniCPM5-1B/blob/main/README.md" target="_blank">English</a> |
|
| 30 |
+
<a href="https://ultradata.openbmb.cn/" target="_blank">UltraData</a> |
|
| 31 |
+
<a href="https://github.com/OpenBMB/MiniCPM-Desk-Pet" target="_blank">MiniCPM 桌宠</a>
|
| 32 |
+
</p>
|
| 33 |
+
|
| 34 |
+
> 这份 model card 在 **MiniCPM5-1B 全系列**(含 final release、SFT 单独版本、base checkpoint 以及对应的 GGUF / MLX / AWQ / GPTQ 变体)之间共用。当前模型在下方 [模型列表](#模型列表) 中标出。
|
| 35 |
+
|
| 36 |
+
## 亮点
|
| 37 |
+
|
| 38 |
+
我们正式发布 **MiniCPM5-1B**,这是 **MiniCPM5** 系列的首个模型。它是一款面向端侧、本地部署和资源受限场景的 1B 稠密 Transformer,在基准评测中达到同尺寸开源模型 SOTA 水平。
|
| 39 |
+
|
| 40 |
+
🏆 **同尺寸开源模型 SOTA**:与同尺寸优秀开源模型相比,MiniCPM5-1B 在该对比范围内达到 SOTA 水平,优势主要体现在 Agentic 工具调用、代码生成和高难推理。
|
| 41 |
+
|
| 42 |
+

|
| 43 |
+
|
| 44 |
+
🧠 **双模式推理**:内置 `<think>` chat template,可通过 `enable_thinking` 在思考模式和非思考模式之间切换。同一份权重既可以作为快速助手,也可以承担更复杂的推理任务。
|
| 45 |
+
|
| 46 |
+
🛠️ **部署 / 微调资源**:MiniCPM GitHub 仓库提供面向主要推理后端和微调框架的单页 cookbook,并配套 Agent Skills,方便复现部署和微调流程。
|
| 47 |
+
|
| 48 |
+
🐱 **桌宠**:我们也提供了由 MiniCPM5-1B 本地驱动的桌宠应用。点击下方封面可打开演示视频。
|
| 49 |
+
|
| 50 |
+
[<img src="https://raw.githubusercontent.com/OpenBMB/MiniCPM/minicpm5/assets/minicpm5/minicpm5_desktop_pet_cover.png" alt="MiniCPM Desk Pet" width="720">](https://github.com/OpenBMB/MiniCPM/raw/minicpm5/assets/minicpm5/minicpm5_desktop_pet_demo.mp4)
|
| 51 |
+
|
| 52 |
+
## 模型列表
|
| 53 |
+
|
| 54 |
+
你可以按运行环境选择对应模型格式:
|
| 55 |
+
|
| 56 |
+
- **[MiniCPM5-1B](https://huggingface.co/openbmb/MiniCPM5-1B)** · [ModelScope](https://www.modelscope.cn/models/OpenBMB/MiniCPM5-1B) · BF16 正式版(经 RL + OPD 后训练) **👈 当前页面**
|
| 57 |
+
- **[MiniCPM5-1B-SFT](https://huggingface.co/openbmb/MiniCPM5-1B-SFT)** · [ModelScope](https://www.modelscope.cn/models/OpenBMB/MiniCPM5-1B-SFT) · BF16 SFT 单独 checkpoint(RL / OPD 之前)
|
| 58 |
+
- **[MiniCPM5-1B-Base](https://huggingface.co/openbmb/MiniCPM5-1B-Base)** · [ModelScope](https://www.modelscope.cn/models/OpenBMB/MiniCPM5-1B-Base) · BF16 base checkpoint(仅预训练)
|
| 59 |
+
- **[MiniCPM5-1B-GGUF](https://huggingface.co/openbmb/MiniCPM5-1B-GGUF)** · [ModelScope](https://www.modelscope.cn/models/OpenBMB/MiniCPM5-1B-GGUF) · GGUF,适用于 llama.cpp / Ollama / LM Studio
|
| 60 |
+
- **[MiniCPM5-1B-MLX](https://huggingface.co/openbmb/MiniCPM5-1B-MLX)** · [ModelScope](https://www.modelscope.cn/models/OpenBMB/MiniCPM5-1B-MLX) · MLX / 4bit,适用于 Apple Silicon
|
| 61 |
+
- **[MiniCPM5-1B-AWQ](https://huggingface.co/openbmb/MiniCPM5-1B-AWQ)** · [ModelScope](https://www.modelscope.cn/models/OpenBMB/MiniCPM5-1B-AWQ) · AWQ-Marlin Int4,适用于 vLLM
|
| 62 |
+
- **[MiniCPM5-1B-GPTQ](https://huggingface.co/openbmb/MiniCPM5-1B-GPTQ)** · [ModelScope](https://www.modelscope.cn/models/OpenBMB/MiniCPM5-1B-GPTQ) · GPTQ-Marlin Int4,适用于 vLLM
|
| 63 |
+
|
| 64 |
+
## 模型信息
|
| 65 |
+
|
| 66 |
+
MiniCPM5-1B 具有以下特性:
|
| 67 |
+
|
| 68 |
+
- **类型**:Causal Language Model
|
| 69 |
+
- **架构**:标准 `LlamaForCausalLM`
|
| 70 |
+
- **Number of Parameters**: 1,080,632,832
|
| 71 |
+
- **Number of Non-Embedding Parameters**: 679,552,512
|
| 72 |
+
- **层数**:24
|
| 73 |
+
- **注意力头(GQA)**:16 个 Q heads / 2 个 KV heads
|
| 74 |
+
- **上下文长度**:131,072
|
| 75 |
+
|
| 76 |
+
## 简介
|
| 77 |
+
|
| 78 |
+
MiniCPM5-1B 是一款紧凑的稠密 decoder-only Transformer,训练目标是提升 1B 参数量级下的输出质量。模型沿用标准 `LlamaForCausalLM` 架构(24 层、GQA 8:1、原生 128K 上下文、1,080,632,832 参数),可以在 Transformers、vLLM、SGLang、llama.cpp、MLX、Ollama、LM Studio 等主流推理后端中直接加载,无需自定义算子。
|
| 79 |
+
|
| 80 |
+
完整架构细节与按组件参数拆解见 GitHub 上的 [Transformers 部署 cookbook](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/transformers.md)。
|
| 81 |
+
|
| 82 |
+
## 评测结果
|
| 83 |
+
|
| 84 |
+
我们选取 **LFM2.5-1.2B-Thinking**、**Qwen3-0.6B/think**、**Qwen3.5-0.8B/think** 等同尺寸优秀开源模型进行横向比较。这些模型本身已经很强;在这组对比中,MiniCPM5-1B 达到同尺寸开源模型 SOTA 水平,优势主要体现在工具调用、代码生成和高难推理上,也更适合承担本地 coding agent、工具助手和推理助手的角色。
|
| 85 |
+
|
| 86 |
+

|
| 87 |
+
|
| 88 |
+
## 训练流程
|
| 89 |
+
|
| 90 |
+
MiniCPM5-1B 的训练过程是 **[UltraData 分级数据管理体系](https://ultradata.openbmb.cn/)** 的一次完整实践,覆盖 base training、mid-training 与后训练三个阶段。
|
| 91 |
+
|
| 92 |
+
**Base training** 采用逐级推进的训练配方,包含 stable training 与 decay training,用于建立基础语言能力与训练稳定性。随后进入 **mid-training**,进一步强化目标能力并适配数据分布。训练语料来自我们同步开源的 [Ultra-FineWeb-L3](https://huggingface.co/datasets/openbmb/Ultra-FineWeb-L3)。
|
| 93 |
+
|
| 94 |
+
**后训练阶段**分为 **SFT**、**RL** 与 **OPD** 三步。我们先使用 **200B tokens deep-thinking SFT** 与 **200B tokens hybrid-thinking SFT** 建立深度思考、混合思考和通用对话能力,相关 SFT 数据已同步开源为 [UltraData-SFT-2605](https://huggingface.co/datasets/openbmb/UltraData-SFT-2605)。随后针对数学、代码、闭卷问答和写作等方向训练专用 **RL teacher**,并通过 **On-Policy Distillation (OPD)** 将这些 teacher 的能力蒸馏回同一个发布模型。
|
| 95 |
+
|
| 96 |
+

|
| 97 |
+
|
| 98 |
+
### RL + OPD 带来了什么?
|
| 99 |
+
|
| 100 |
+
**RL + OPD** 是 MiniCPM5-1B 后训练中的关键环节。在数学、代码、指令跟随三类任务上,RL + OPD 将平均分提升 **↑16 分**,同时将回复触顶 max-tokens 预算的比例降低 **↓29 个百分点**。下方图示展示 Reasoning RL 两阶段流程、分数提升和超长率下降。
|
| 101 |
+
|
| 102 |
+
**RL** 阶段组合了多类互补训练信号。Reasoning RL 使用 [DAPO-Math-17k](https://huggingface.co/datasets/BytedTsinghua-SIA/DAPO-Math-17k) 强化数学推理;闭卷问答使用 [TriviaQA](https://huggingface.co/datasets/mandarjoshi/trivia_qa) 和 [NQ-Open](https://huggingface.co/datasets/google-research-datasets/nq_open),并通过系统提示引导模型在不确定时承认不知道,而不是随机猜测。写作能力来自 [LongWriter-Zero-RLData](https://huggingface.co/datasets/THU-KEG/LongWriter-Zero-RLData);指令跟随和长上下文理解则使用从通用语料合成的可验证 RLVR 数据。通用对话部分基于 anchor responses 构造 pair-wise RLHF 信号,由 Generative Reward Model 进行偏好判断。
|
| 103 |
+
|
| 104 |
+

|
| 105 |
+
|
| 106 |
+
**OPD** 阶段参考 Thinking Machines Lab 的 [On-Policy Distillation](https://thinkingmachines.ai/blog/on-policy-distillation/) 思路,并结合 [Rethinking On-Policy Distillation](https://arxiv.org/pdf/2604.13016) 做了实现改进。我们在强化学习框架中使用反向 KL 散度作为优势估计值,替代原有的 verification-based advantage;同时在 response 序列的每个位置分别对学生模型和教师模型 logits 做双边 top-k 采样,取并集后计算反向 KL 散度,以平衡监督信号准确性和训练效率。OPD 直接复用各 RL teacher 训练时的同分布 prompt 作为蒸馏数据,无需额外构造语料。
|
| 107 |
+
|
| 108 |
+

|
| 109 |
+
|
| 110 |
+

|
| 111 |
+
|
| 112 |
+
## 快速上手
|
| 113 |
+
|
| 114 |
+
### vLLM
|
| 115 |
+
|
| 116 |
+
```bash
|
| 117 |
+
pip install "vllm>=0.21"
|
| 118 |
+
vllm serve openbmb/MiniCPM5-1B --port 8000
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
```bash
|
| 122 |
+
curl http://localhost:8000/v1/chat/completions \
|
| 123 |
+
-H "Content-Type: application/json" \
|
| 124 |
+
-d '{
|
| 125 |
+
"model": "openbmb/MiniCPM5-1B",
|
| 126 |
+
"messages": [{"role": "user", "content": "你是谁?可以简单介绍一下自己吗?"}],
|
| 127 |
+
"max_tokens": 128,
|
| 128 |
+
"temperature": 0.7
|
| 129 |
+
}'
|
| 130 |
+
```
|
| 131 |
+
|
| 132 |
+
### SGLang
|
| 133 |
+
|
| 134 |
+
```bash
|
| 135 |
+
pip install "sglang[srt]>=0.5.12"
|
| 136 |
+
python -m sglang.launch_server --model-path openbmb/MiniCPM5-1B --port 30000
|
| 137 |
+
```
|
| 138 |
+
|
| 139 |
+
```bash
|
| 140 |
+
curl http://localhost:30000/v1/chat/completions \
|
| 141 |
+
-H "Content-Type: application/json" \
|
| 142 |
+
-d '{
|
| 143 |
+
"model": "openbmb/MiniCPM5-1B",
|
| 144 |
+
"messages": [{"role": "user", "content": "你是谁?可以简单介绍一下自己吗?"}],
|
| 145 |
+
"max_tokens": 128,
|
| 146 |
+
"temperature": 0.7
|
| 147 |
+
}'
|
| 148 |
+
```
|
| 149 |
+
|
| 150 |
+
### Transformers
|
| 151 |
+
|
| 152 |
+
```bash
|
| 153 |
+
pip install -U "transformers>=5.6" accelerate torch
|
| 154 |
+
```
|
| 155 |
+
|
| 156 |
+
```python
|
| 157 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 158 |
+
|
| 159 |
+
model_id = "openbmb/MiniCPM5-1B"
|
| 160 |
+
tokenizer = AutoTokenizer.from_pretrained(model_id)
|
| 161 |
+
model = AutoModelForCausalLM.from_pretrained(
|
| 162 |
+
model_id,
|
| 163 |
+
torch_dtype="auto",
|
| 164 |
+
device_map="auto",
|
| 165 |
+
)
|
| 166 |
+
|
| 167 |
+
messages = [{"role": "user", "content": "你是谁?可以简单介绍一下自己吗?"}]
|
| 168 |
+
inputs = tokenizer.apply_chat_template(
|
| 169 |
+
messages,
|
| 170 |
+
tokenize=True,
|
| 171 |
+
add_generation_prompt=True,
|
| 172 |
+
enable_thinking=False,
|
| 173 |
+
return_tensors="pt",
|
| 174 |
+
).to(model.device)
|
| 175 |
+
|
| 176 |
+
outputs = model.generate(inputs, max_new_tokens=128)
|
| 177 |
+
print(tokenizer.decode(outputs[0][inputs.shape[-1]:], skip_special_tokens=True))
|
| 178 |
+
```
|
| 179 |
+
|
| 180 |
+
推荐的 chat template 采样参数:
|
| 181 |
+
|
| 182 |
+
| 模式 | 推荐采样参数 | 启用方式 |
|
| 183 |
+
| --- | --- | --- |
|
| 184 |
+
| **Think** | `temperature=0.9, top_p=0.95` | `enable_thinking=True` |
|
| 185 |
+
| **No Think** | `temperature=0.7, top_p=0.95` | `enable_thinking=False` |
|
| 186 |
+
|
| 187 |
+
## 工具调用
|
| 188 |
+
|
| 189 |
+
工具调用 / function calling **推荐使用 SGLang**。MiniCPM5-1B 以 XML 格式产出工具调用,SGLang 内置的 `minicpm5` parser 会自动将其转换为 OpenAI 兼容的 `tool_calls` 字段。
|
| 190 |
+
|
| 191 |
+
```bash
|
| 192 |
+
python -m sglang.launch_server --model-path openbmb/MiniCPM5-1B --port 30000 \
|
| 193 |
+
--tool-call-parser minicpm5 # 或:--tool-call-parser auto
|
| 194 |
+
```
|
| 195 |
+
|
| 196 |
+
## GitHub Cookbooks 与 Agent Skills
|
| 197 |
+
|
| 198 |
+
MiniCPM5-1B 使用**标准 `LlamaForCausalLM` 架构**,主流推理引擎可直接加载,**无需自定义算子,也无模型代码 fork**。逐步部署和微调说明请参考下方 GitHub cookbooks;Agent Skills 作为 GitHub 资源提供给使用 Cursor / Claude Code 类 coding agent 的用户。
|
| 199 |
+
|
| 200 |
+
| 后端 / 框架 | 模型格式 / 适用场景 | Cookbook | Agent Skill |
|
| 201 |
+
| --- | --- | --- | --- |
|
| 202 |
+
| Transformers | BF16 / FP16,本地 Python 推理,GPU + CPU | [transformers.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/transformers.md) | [minicpm5-deploy-transformers](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-deploy-transformers/SKILL.md) |
|
| 203 |
+
| vLLM | BF16 / FP16 OpenAI server;支持 AWQ / GPTQ-Marlin Int4 量化权重 | [vllm.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/vllm.md);量化:[awq.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/awq.md) / [gptq.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/gptq.md) | [minicpm5-deploy-vllm](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-deploy-vllm/SKILL.md);量化:[awq](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-deploy-awq/SKILL.md) / [gptq](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-deploy-gptq/SKILL.md) |
|
| 204 |
+
| SGLang | BF16 / FP16 OpenAI server,推荐用于 tool calling | [sglang.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/sglang.md) | [minicpm5-deploy-sglang](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-deploy-sglang/SKILL.md) |
|
| 205 |
+
| llama.cpp | GGUF,CPU/GPU 本地推理 | [llama_cpp.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/llama_cpp.md) | [minicpm5-deploy-llama-cpp](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-deploy-llama-cpp/SKILL.md) |
|
| 206 |
+
| Ollama | GGUF,本地端侧运行 | [ollama.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/ollama.md) | [minicpm5-deploy-ollama](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-deploy-ollama/SKILL.md) |
|
| 207 |
+
| LM Studio | GGUF,Mac 桌面应用与 OpenAI server | [lmstudio.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/lmstudio.md) | [minicpm5-deploy-lmstudio](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-deploy-lmstudio/SKILL.md) |
|
| 208 |
+
| MLX | MLX / 4bit,Apple Silicon 本地推理 | [mlx.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/deployment/mlx.md) | [minicpm5-deploy-mlx](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-deploy-mlx/SKILL.md) |
|
| 209 |
+
| TRL + PEFT | LoRA / SFT 微调 | [trl.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/finetune/trl.md) | [minicpm5-finetune-trl](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-finetune-trl/SKILL.md) |
|
| 210 |
+
| LLaMA-Factory | 微调 | [llamafactory.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/finetune/llamafactory.md) | [minicpm5-finetune-llamafactory](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-finetune-llamafactory/SKILL.md) |
|
| 211 |
+
| ms-swift | 微调 | [ms_swift.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/finetune/ms_swift.md) | [minicpm5-finetune-ms-swift](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-finetune-ms-swift/SKILL.md) |
|
| 212 |
+
| unsloth | 微调 | [unsloth.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/finetune/unsloth.md) | [minicpm5-finetune-unsloth](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-finetune-unsloth/SKILL.md) |
|
| 213 |
+
| xtuner | 微调 | [xtuner.md](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/docs/finetune/xtuner.md) | [minicpm5-finetune-xtuner](https://github.com/OpenBMB/MiniCPM/blob/minicpm5/skills/minicpm5-finetune-xtuner/SKILL.md) |
|
| 214 |
+
|
| 215 |
+
## 桌宠
|
| 216 |
+
|
| 217 |
+
我们也发布了 **[OpenBMB/MiniCPM-Desk-Pet](https://github.com/OpenBMB/MiniCPM-Desk-Pet)**,一个由 MiniCPM5-1B 本地驱动的桌宠应用。它支持 Apple Silicon / NVIDIA GPU / CPU 路线,可以与 Cursor���Claude Code、Codex 等 coding agent 联动,并支持 LoRA 人格切换。
|
| 218 |
+
|
| 219 |
+
## 局限性与负责任使用
|
| 220 |
+
|
| 221 |
+
MiniCPM5-1B 是一个基于训练数据统计规律生成文本的语言模型,可能生成不准确、有偏见或不安全的内容。在高风险场景中使用前,应对模型输出进行审查和验证。
|
| 222 |
+
|
| 223 |
+
用户需要自行评估模型输出,配置必要的安全防护,并遵守适用法律法规和平台政策。
|
| 224 |
+
|
| 225 |
+
## 开源协议
|
| 226 |
+
|
| 227 |
+
MiniCPM 模型权重与相关代码依照 [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) 协议发布。
|
| 228 |
+
|
| 229 |
+
## 引用
|
| 230 |
+
|
| 231 |
+
如果觉得我们的工作有帮助,请引用:
|
| 232 |
+
|
| 233 |
+
```bibtex
|
| 234 |
+
@article{minicpm4,
|
| 235 |
+
title={Minicpm4: Ultra-efficient llms on end devices},
|
| 236 |
+
author={MiniCPM, Team},
|
| 237 |
+
journal={arXiv preprint arXiv:2506.07900},
|
| 238 |
+
year={2025}
|
| 239 |
+
}
|
| 240 |
+
```
|