Improve model card: add metadata, link to paper, add model overview
#5
by
nielsr
HF Staff
- opened
README.md
CHANGED
|
@@ -2,150 +2,327 @@
|
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
- zh
|
|
|
|
|
|
|
|
|
|
| 5 |
tags:
|
| 6 |
- MiniCPM
|
| 7 |
- ModelBest
|
| 8 |
- THUNLP
|
| 9 |
---
|
| 10 |
|
| 11 |
-
|
| 12 |
<div align="center">
|
| 13 |
-
<
|
| 14 |
-
MiniCPM
|
| 15 |
-
</h1>
|
| 16 |
</div>
|
| 17 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 18 |
<p align="center">
|
| 19 |
-
<a href="https://
|
| 20 |
-
<a href="https://
|
| 21 |
-
<a href="https://
|
|
|
|
|
|
|
|
|
|
| 22 |
</p>
|
| 23 |
|
| 24 |
-
|
| 25 |
-
|
| 26 |
-
|
| 27 |
-
-
|
| 28 |
-
-
|
| 29 |
-
-
|
| 30 |
-
|
| 31 |
-
|
| 32 |
-
|
| 33 |
-
-
|
| 34 |
-
-
|
| 35 |
-
- MiniCPM-2B
|
| 36 |
-
-
|
| 37 |
-
|
| 38 |
-
|
| 39 |
-
|
| 40 |
-
|
| 41 |
-
-
|
| 42 |
-
-
|
| 43 |
-
- MiniCPM
|
| 44 |
-
-
|
| 45 |
-
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
-
|
| 50 |
-
-
|
| 51 |
-
-
|
| 52 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 53 |
|
| 54 |
-
|
| 55 |
|
| 56 |
-
|
| 57 |
|
| 58 |
-
|
| 59 |
|
| 60 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 61 |
|
| 62 |
-
|
| 63 |
-
We are investigating the cause now.
|
| 64 |
|
| 65 |
-
|
|
|
|
|
|
|
|
|
|
| 66 |
|
| 67 |
-
|
| 68 |
-
- 为了保证在学术研究用途上模型的通用性,我们未对模型进行任何身份认同训练。同时由于我们用ShareGPT开源语料作为部分训练数据,模型可能会输出类似GPT系列模型的身份认同信息;
|
| 69 |
-
- 受限于模型规模,模型的输出受到提示词(prompt)的影响较大,可能多次尝试产生不一致的结果;
|
| 70 |
-
- 受限于模型容量,模型的知识记忆较不准确,后续我们将结合RAG方法来增强模型的知识记忆能力。
|
| 71 |
|
| 72 |
-
|
| 73 |
-
- To ensure the universality of the model for academic research purposes, we did not conduct any identity training on the model. Meanwhile, as we use ShareGPT open-source corpus as part of the training data, the model may output identity information similar to the GPT series models.
|
| 74 |
-
- Due to the limitation of model size, the output of the model is greatly influenced by prompt words, which may result in inconsistent results from multiple attempts.
|
| 75 |
-
- Due to limited model capacity, the model's knowledge memory is not accurate. In the future, we will combine the RAG method to enhance the model's knowledge memory ability.
|
| 76 |
|
| 77 |
-
## 模型下载 Download
|
| 78 |
-
|
| 79 |
-
| HuggingFace | ModelScope | WiseModel |
|
| 80 |
-
|-------------|------------|-----------|
|
| 81 |
-
|[sft-bf16](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16)|[sft-bf16](https://modelscope.cn/models/OpenBMB/miniCPM-bf16)|[sft-bf16](https://wisemodel.cn/models/OpenBMB/miniCPM-bf16)
|
| 82 |
-
|[sft-fp32](https://huggingface.co/openbmb/MiniCPM-2B-sft-fp32)|[sft-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-sft-fp32)|[sft-fp32](https://wisemodel.cn/models/OpenBMB/miniCPM-dpo-fp32)
|
| 83 |
-
|[dpo-bf16](https://huggingface.co/openbmb/MiniCPM-2B-dpo-bf16)|[dpo-bf16](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-bf16/summary)|[dpo-bf16](https://wisemodel.cn/models/OpenBMB/MiniCPM-2B-dpo-bf16)
|
| 84 |
-
|[dpo-fp16](https://huggingface.co/openbmb/MiniCPM-2B-dpo-fp16)|[dpo-fp16](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-fp16/)|[dpo-fp16](https://wisemodel.cn/models/OpenBMB/MiniCPM-2B-dpo-fp16)
|
| 85 |
-
|[dpo-fp32](https://huggingface.co/openbmb/MiniCPM-2B-dpo-fp32)|[dpo-fp32](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-fp32)|[dpo-fp32](https://wisemodel.cn/models/OpenBMB/miniCPM-dpo-fp32)
|
| 86 |
-
|
| 87 |
-
## 模型使用 Usage
|
| 88 |
-
|
| 89 |
-
* 安装`transformers>=4.36.0`以及`accelerate`后,运行以下代码
|
| 90 |
-
* 注意:需要在`from_pretrained`中明确指明模型的数据类型,否则会引起较大计算误差
|
| 91 |
-
* Run the following code after install `transformers>=4.36.0` and `accelerate`
|
| 92 |
-
* Warning: It is necessary to specify the data type of the model clearly in 'from_pretrained', otherwise large calculation errors will be caused
|
| 93 |
```python
|
| 94 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 95 |
import torch
|
| 96 |
torch.manual_seed(0)
|
| 97 |
|
| 98 |
-
path = 'openbmb/
|
|
|
|
| 99 |
tokenizer = AutoTokenizer.from_pretrained(path)
|
| 100 |
-
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map=
|
| 101 |
-
|
| 102 |
-
|
| 103 |
-
|
| 104 |
-
|
| 105 |
-
|
| 106 |
-
|
| 107 |
-
|
| 108 |
-
|
| 109 |
-
|
| 110 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 111 |
```
|
| 112 |
|
| 113 |
-
|
| 114 |
-
|
| 115 |
-
#### 模型协议 Model LICENSE
|
| 116 |
|
| 117 |
-
* 本仓库中代码依照 [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) 协议开源
|
| 118 |
-
* MiniCPM 模型权重的使用则需要遵循 [“通用模型许可协议-来源说明-宣传限制-商业授权”](https://github.com/OpenBMB/General-Model-License/blob/main/%E9%80%9A%E7%94%A8%E6%A8%A1%E5%9E%8B%E8%AE%B8%E5%8F%AF%E5%8D%8F%E8%AE%AE-%E6%9D%A5%E6%BA%90%E8%AF%B4%E6%98%8E-%E5%AE%A3%E4%BC%A0%E9%99%90%E5%88%B6-%E5%95%86%E4%B8%9A%E6%8E%88%E6%9D%83.md)。
|
| 119 |
-
* MiniCPM 模型权重对学术研究完全开放。
|
| 120 |
-
* 如需将模型用于商业用途,请联系cpm@modelbest.cn来获取书面授权,在登记后亦允许免费商业使用。
|
| 121 |
|
| 122 |
-
|
| 123 |
-
* The usage of MiniCPM model weights must strictly follow [the General Model License (GML)](https://github.com/OpenBMB/General-Model-License/blob/main/%E9%80%9A%E7%94%A8%E6%A8%A1%E5%9E%8B%E8%AE%B8%E5%8F%AF%E5%8D%8F%E8%AE%AE-%E6%9D%A5%E6%BA%90%E8%AF%B4%E6%98%8E-%E5%AE%A3%E4%BC%A0%E9%99%90%E5%88%B6-%E5%95%86%E4%B8%9A%E6%8E%88%E6%9D%83.md).
|
| 124 |
-
* The models and weights of MiniCPM are completely free for academic research.
|
| 125 |
-
* If you intend to utilize the model for commercial purposes, please reach out to cpm@modelbest.cn to obtain the certificate of authorization.
|
| 126 |
|
| 127 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 128 |
|
| 129 |
-
|
| 130 |
-
|
| 131 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 132 |
|
| 133 |
-
|
| 134 |
-
* However, it does not possess the ability to comprehend or express personal opinions or value judgments.
|
| 135 |
-
* Any content generated by MiniCPM does not represent the viewpoints or positions of the model developers.
|
| 136 |
-
* Therefore, when using content generated by MiniCPM, users should take full responsibility for evaluating and verifying it on their own.
|
| 137 |
|
| 138 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 139 |
|
| 140 |
-
|
| 141 |
|
| 142 |
-
|
| 143 |
-
* Please cite our [techinical report](https://shengdinghu.notion.site/MiniCPM-Unveiling-the-Potential-of-End-side-Large-Language-Models-d4d3a8c426424654a4e80e42a711cb20?pvs=4) if you find our work valuable.
|
| 144 |
|
| 145 |
-
```
|
| 146 |
-
|
| 147 |
-
|
| 148 |
-
|
| 149 |
-
|
| 150 |
-
|
| 151 |
-
```
|
|
|
|
| 2 |
language:
|
| 3 |
- en
|
| 4 |
- zh
|
| 5 |
+
library_name: transformers
|
| 6 |
+
pipeline_tag: text-generation
|
| 7 |
+
license: apache-2.0
|
| 8 |
tags:
|
| 9 |
- MiniCPM
|
| 10 |
- ModelBest
|
| 11 |
- THUNLP
|
| 12 |
---
|
| 13 |
|
|
|
|
| 14 |
<div align="center">
|
| 15 |
+
<img src="./assets/minicpm_logo.png" width="500em" ></img>
|
|
|
|
|
|
|
| 16 |
</div>
|
| 17 |
|
| 18 |
+
<h4 align="center">
|
| 19 |
+
<p>
|
| 20 |
+
<b>中文</b> | <a href="https://github.com/OpenBMB/MiniCPM/blob/main/README-en.md">English</a>
|
| 21 |
+
<p>
|
| 22 |
+
</h4>
|
| 23 |
+
|
| 24 |
<p align="center">
|
| 25 |
+
<a href="https://openbmb.vercel.app/?category=Chinese+Blog" target="_blank">MiniCPM 技术博客</a> |
|
| 26 |
+
<a href="https://modelbest.feishu.cn/wiki/D2tFw8Pcsi5CIzkaHNacLK64npg" target="_blank">MiniCPM 知识库</a> |
|
| 27 |
+
<a href="https://github.com/OpenBMB/MiniCPM/blob/main/report/MiniCPM_4_Technical_Report.pdf" target="_blank">MiniCPM 论文</a> |
|
| 28 |
+
<a href="https://github.com/OpenBMB/MiniCPM-V/" target="_blank">MiniCPM-V 仓库</a> |
|
| 29 |
+
加入我们的 <a href="https://discord.gg/3cGQn9b3YM" target="_blank">discord</a> 和 <a href="https://github.com/OpenBMB/MiniCPM/blob/main/assets/wechat.jpg" target="_blank">微信群</a>
|
| 30 |
+
|
| 31 |
</p>
|
| 32 |
|
| 33 |
+
This repository contains the model of the paper [MiniCPM4: Ultra-Efficient LLMs on End Devices](https://huggingface.co/papers/2506.07900).
|
| 34 |
+
|
| 35 |
+
## 更新日志🔥
|
| 36 |
+
- [2025.06.06] **发布 [MiniCPM4](https://huggingface.co/collections/openbmb/minicpm-4-6841ab29d180257e940baa9b)!该模型在保持同等规模最优性能的同时,实现了极致的效率提升!在典型端侧芯片上能够实现 5 倍以上生成加速!**
|
| 37 |
+
- [2024.09.28] [LLMxMapReduce](https://github.com/thunlp/LLMxMapReduce) 开源,支持 MiniCPM3-4B,理论上支持无限长文本输入!
|
| 38 |
+
- [2024.09.18] [SGLang](https://github.com/sgl-project/sglang) 已经支持 MiniCPM3-4B (推荐使用)!由于 SGLang v0.3 对 MiniCPM3 中使用的 MLA 结构进行了推理优化,吞吐量相比于 vLLM 提高 70%]
|
| 39 |
+
- [2024.09.16] [llama.cpp](https://github.com/ggerganov/llama.cpp/releases/tag/b3765) 已经官方支持 MiniCPM3-4B|[用法](#llamacpp)]
|
| 40 |
+
- [2024.09.05] 发布 [MiniCPM3-4B](https://huggingface.co/openbmb/MiniCPM3-4B)!该模型的表现超越 Phi-3.5-mini-instruct 和 GPT-3.5-Turbo-0125,并且能够比肩 Llama3.1-8B-Instruct、Qwen2-7B-Instruct、GLM-4-9B-Chat 等多个 7B-9B 参数量的模型。
|
| 41 |
+
- [2024.07.09] MiniCPM-2B 已经支持使用 [SGLang](#sglang-推理) 推理!
|
| 42 |
+
- [2024.07.05] 发布 [MiniCPM-S-1B](https://huggingface.co/openbmb/MiniCPM-S-1B-sft)!该模型在保持下游任务性能无损的前提下,FFN 层实现了 87.89% 的平均稀疏度,将 FFN FLOPs 降低了 84%。
|
| 43 |
+
- [2024.04.11] 发布 [MiniCPM-2B-128k](https://huggingface.co/openbmb/MiniCPM-2B-128k)、[MiniCPM-MoE-8x2B](https://huggingface.co/openbmb/MiniCPM-MoE-8x2B) 和 [MiniCPM-1B](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16)!点击[这里](https://openbmb.vercel.app/?category=Chinese+Blog)查看技术博客。
|
| 44 |
+
- [2024.03.16] MiniCPM-2B 的 30 余个中间检查点开放了
|
| 45 |
+
- [2024.02.01] 发布 [MiniCPM-2B](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16)!该模型在公开评测集上与 Mistral-7B 表现相近(中文、数学、代码能力更优),整体性能超越 Llama2-13B、MPT-30B、Falcon-40B 等模型。
|
| 46 |
+
|
| 47 |
+
## 目录
|
| 48 |
+
|
| 49 |
+
- [更新日志🔥](#更新日志)
|
| 50 |
+
- [目录](#目录)
|
| 51 |
+
- [模型下载](#模型下载)
|
| 52 |
+
- [MiniCPM 4.0](#minicpm-40)
|
| 53 |
+
- [评测结果](#评测结果)
|
| 54 |
+
- [效率评测](#效率评测)
|
| 55 |
+
- [综合评测](#综合评测)
|
| 56 |
+
- [长文本评测](#长文本评测)
|
| 57 |
+
- [BitCPM4: 模型量化](#bitcpm4-模型量化)
|
| 58 |
+
- [BitCPM4评测](#bitcpm4评测)
|
| 59 |
+
- [BitCPM4模型推理](#bitcpm4模型推理)
|
| 60 |
+
- [模型应用](#模型应用)
|
| 61 |
+
- [MiniCPM4-Survey: 综述生成](#minicpm4-survey-综述生成)
|
| 62 |
+
- [MiniCPM4-MCP: MCP增强的工具调用](#minicpm4-mcp-mcp增强的工具调用)
|
| 63 |
+
- [模型推理](#模型推理)
|
| 64 |
+
- [CPM.cu](#cpmcu)
|
| 65 |
+
- [HuggingFace](#huggingface)
|
| 66 |
+
- [vLLM](#vllm)
|
| 67 |
+
- [SGLang](#sglang)
|
| 68 |
+
- [模型微调](#模型微调)
|
| 69 |
+
- [LLaMA-Factory](#llamA-factory)
|
| 70 |
+
- [MiniCPM 3.0](#minicpm-30)
|
| 71 |
+
- [MiniCPM 2.0](#minicpm-20)
|
| 72 |
+
- [MiniCPM 1.0](#minicpm-10)
|
| 73 |
+
- [开源协议](#开源协议)
|
| 74 |
+
- [开发机构](#开发机构)
|
| 75 |
+
- [工作引用](#工作引用)
|
| 76 |
+
|
| 77 |
+
|
| 78 |
+
## 模型下载
|
| 79 |
+
|
| 80 |
+
| HuggingFace | ModelScope |
|
| 81 |
+
|-------------|------------|
|
| 82 |
+
| [MiniCPM4-8B](https://huggingface.co/openbmb/MiniCPM4-8B) | [MiniCPM4-8B](https://www.modelscope.cn/models/OpenBMB/MiniCPM4-8B) |
|
| 83 |
+
| [MiniCPM4-0.5B](https://huggingface.co/openbmb/MiniCPM4-0.5B) | [MiniCPM4-0.5B](https://www.modelscope.cn/models/OpenBMB/MiniCPM4-0.5B) |
|
| 84 |
+
| [BitCPM4-1B](https://huggingface.co/openbmb/BitCPM4-1B) | [BitCPM4-1B](https://www.modelscope.cn/models/OpenBMB/BitCPM4-1B) |
|
| 85 |
+
| [BitCPM4-0.5B](https://huggingface.co/openbmb/BitCPM4-0.5B) | [BitCPM4-0.5B](https://www.modelscope.cn/models/OpenBMB/BitCPM4-0.5B) |
|
| 86 |
+
| [MiniCPM4-8B-Eagle-FRSpec](https://huggingface.co/openbmb/MiniCPM4-8B-Eagle-FRSpec) | [MiniCPM4-8B-Eagle-FRSpec](https://www.modelscope.cn/models/OpenBMB/MiniCPM4-8B-Eagle-FRSpec) |
|
| 87 |
+
| [MiniCPM4-8B-Eagle-FRSpec-QAT](https://huggingface.co/openbmb/MiniCPM4-8B-Eagle-FRSpec-QAT) | [MiniCPM4-8B-Eagle-FRSpec-QAT](https://www.modelscope.cn/models/OpenBMB/MiniCPM4-8B-Eagle-FRSpec-QAT) |
|
| 88 |
+
| [MiniCPM4-8B-Eagle-vLLM](https://huggingface.co/openbmb/MiniCPM4-8B-Eagle-vLLM) | [MiniCPM4-8B-Eagle-vLLM](https://www.modelscope.cn/models/OpenBMB/MiniCPM4-8B-Eagle-vLLM) |
|
| 89 |
+
| [MiniCPM4-8B-marlin-Eagle-vLLM](https://huggingface.co/openbmb/MiniCPM4-8B-marlin-Eagle-vLLM) | [MiniCPM4-8B-marlin-Eagle-vLLM](https://www.modelscope.cn/models/OpenBMB/MiniCPM4-8B-marlin-Eagle-vLLM) |
|
| 90 |
+
| [MiniCPM4-Survey](https://huggingface.co/openbmb/MiniCPM4-Survey) | [MiniCPM4-Survey](https://www.modelscope.cn/models/OpenBMB/MiniCPM4-Survey) |
|
| 91 |
+
| [MiniCPM4-MCP](https://huggingface.co/openbmb/MiniCPM4-MCP) | [MiniCPM4-MCP](https://www.modelscope.cn/models/OpenBMB/MiniCPM4-MCP) |
|
| 92 |
+
| [MiniCPM4-0.5B-QAT-Int4-unquantized](https://huggingface.co/openbmb/MiniCPM4-0.5B-QAT-Int4-unquantized) | [MiniCPM4-0.5B-QAT-Int4-unquantized](https://modelscope.cn/models/OpenBMB/MiniCPM4-0.5B-QAT-Int4-unquantized) |
|
| 93 |
+
| [MiniCPM4-0.5B-QAT-Int4-GPTQ-format](https://huggingface.co/openbmb/MiniCPM4-0.5B-QAT-Int4-GPTQ-format) | [MiniCPM4-0.5B-QAT-Int4-GPTQ-format](https://modelscope.cn/models/OpenBMB/MiniCPM4-0.5B-QAT-Int4-GPTQ-format) |
|
| 94 |
+
|[MiniCPM3-4B](https://huggingface.co/openbmb/MiniCPM3-4B)|[MiniCPM3-4B](https://www.modelscope.cn/models/OpenBMB/MiniCPM3-4B)|
|
| 95 |
+
|[MiniCPM-2B-sft](https://huggingface.co/openbmb/MiniCPM-2B-sft-bf16)|[MiniCPM-2B-sft](https://modelscope.cn/models/OpenBMB/miniCPM-bf16)|
|
| 96 |
+
|[MiniCPM-2B-dpo](https://huggingface.co/openbmb/MiniCPM-2B-dpo-bf16)|[MiniCPM-2B-dpo](https://modelscope.cn/models/OpenBMB/MiniCPM-2B-dpo-bf16/summary)|
|
| 97 |
+
|[MiniCPM-2B-128k](https://huggingface.co/openbmb/MiniCPM-2B-128k) |[MiniCPM-2B-128k](https://modelscope.cn/models/openbmb/MiniCPM-2B-128k/summary)|
|
| 98 |
+
|[MiniCPM-MoE-8x2B](https://huggingface.co/openbmb/MiniCPM-MoE-8x2B) |[MiniCPM-MoE-8x2B](https://modelscope.cn/models/OpenBMB/MiniCPM-MoE-8x2B)|
|
| 99 |
+
|[MiniCPM-1B](https://huggingface.co/openbmb/MiniCPM-1B-sft-bf16) | [MiniCPM-1B](https://modelscope.cn/models/OpenBMB/MiniCPM-1B-sft-bf16) |
|
| 100 |
+
|[MiniCPM-S-1B](https://huggingface.co/openbmb/MiniCPM-S-1B-sft)|[MiniCPM-S-1B](https://modelscope.cn/models/OpenBMB/MiniCPM-S-1B-sft)|
|
| 101 |
+
|
| 102 |
+
注: 更多模型版本见[这里](https://huggingface.co/collections/openbmb/minicpm-2b-65d48bf958302b9fd25b698f)。
|
| 103 |
+
|
| 104 |
+
## MiniCPM 4.0
|
| 105 |
+
MiniCPM 4 是一个极致高效的端侧大模型,从模型架构、学习算法、训练数据与推理系统四个层面进行了高效优化,实现了极致的效率提升。
|
| 106 |
+
- 🏗️ 高效模型架构:
|
| 107 |
+
- InfLLM v2 -- 可训练的稀疏注意力机制:采用可训练的稀疏注意力机制架构,在 128K 长文本处理中,每个词元仅需与不足 5% 的词元进行相关性计算,显著降低长文本的计算开销
|
| 108 |
+
- 🧠 高效学习算法:
|
| 109 |
+
- 模型风洞 2.0 -- 高效 Predictable Scaling:引入下游任务的 Scaling 预测方法,实现更精准的模型训练配置搜索
|
| 110 |
+
- BitCPM -- 极致的三值量化:将模型参数位宽压缩至 3 值,实现模型位宽 90% 的极致瘦身
|
| 111 |
+
- 高效训练工程优化:采用 FP8 低精度计算技术,结合多词元预测(Multi-token Prediction)训练策略
|
| 112 |
+
- 📚 高知识密度训练数据:
|
| 113 |
+
- UltraClean -- 高质量预训练数据的清洗与合成:构建基于高效验证的迭代式数据清洗策略,开源高质量中英文预训练数据集 [UltraFineweb](https://huggingface.co/datasets/openbmb/Ultra-FineWeb)
|
| 114 |
+
- UltraChat v2 -- 高质量有监督微调数据合成:构建大规模高质量有监督微调数据集,涵盖知识密集型数据、推理密集型数据、指令遵循数据、长文本理解数据、工具调用数据等多个维度
|
| 115 |
+
- ⚡ 高效推理系统:
|
| 116 |
+
- CPM.cu -- 轻量级的高效CUDA推理框架:融合了稀疏注意力机制、模型量化与投机采样,充分体现MiniCPM4的效率优势
|
| 117 |
+
- ArkInfer -- 跨平台部署系统:支持多后端环境的一键部署,提供灵活的跨平台适配能力
|
| 118 |
+
|
| 119 |
+
### 评测结果
|
| 120 |
+
#### 效率评测
|
| 121 |
+
在 Jetson AGX Orin 和 RTX 4090 两款典型端侧芯片上,MiniCPM4 在长文本处理任务中展现出大幅领先同尺寸模型的处理速度。随着文本长度的增加,MiniCPM4 的性能优势愈发显著。在 Jetson AGX Orin 平台上,相较于 Qwen3-8B,MiniCPM4 实现了约 7 倍的生成速度提升。
|
| 122 |
+
|
| 123 |
+

|
| 124 |
+
|
| 125 |
+
#### 综合评测
|
| 126 |
+
MiniCPM4 推出端侧 8B、0.5B 两种参数规模版本,均在同级别模型中实现了最佳性能表现。
|
| 127 |
+

|
| 128 |
+
|
| 129 |
+
#### 长文本评测
|
| 130 |
+
MiniCPM4 基于 32K 长文本进行预训练,并通过 YaRN 技术实现长度扩展。在 128K 长文本的大海捞针任务中,MiniCPM4 展现出卓越的性能表现。
|
| 131 |
+
|
| 132 |
+

|
| 133 |
+
|
| 134 |
+
### BitCPM4: 模型量化
|
| 135 |
+
BitCPM4 是基于 MiniCPM 系列模型进行量化感知训练(QAT)后得到的三值量化模型,在训练效率和模型参数效率实现了有效的提升。
|
| 136 |
+
- 训练方法改进
|
| 137 |
+
- 在小规模模型上进行风洞实验,搜索训练所需的训练超参。
|
| 138 |
+
- 通过使用一阶段高精训练+二阶段 QAT 的方法,充分利用已经完成或部分完成训练的高精度模型,极大地压缩了 QAT 阶段所需要的算力。
|
| 139 |
+
- 高效参数效率
|
| 140 |
+
- 模型使用 1.58Bit 的位宽达到的性能对标与同参数量级别的全精度模型,模型参数效率高。
|
| 141 |
+
|
| 142 |
+
#### BitCPM4 评测
|
| 143 |
+
BitCPM4 在测试中的表现可以对标同级别的业界主流全精度模型。
|
| 144 |
+

|
| 145 |
+
|
| 146 |
+
#### BitCPM4 模型推理
|
| 147 |
+
BitCPM4 开源的模型参数为伪量化形式,可以直接使用 Huggingface 框架进行推理。
|
| 148 |
+
|
| 149 |
+
### 模型应用
|
| 150 |
+
|
| 151 |
+
#### MiniCPM4-Survey: 综述生成
|
| 152 |
+
MiniCPM4-Survey 是由 [THUNLP](https://nlp.csai.tsinghua.edu.cn)、中国人民大学和 [ModelBest](https://modelbest.cn/en) 联合开发的开源大语言模型智能体。它基于 MiniCPM4-8B 基座模型,接受用户质量作为输入,自主生成可信的长篇综述论文。
|
| 153 |
+
主要特性包括:
|
| 154 |
+
- 计划-检索-写作生成框架 — 我们提出了一个多智能体生成框架,包含三个核心阶段:计划(定义综述的整体结构)、检索(生成合适的检索关键词)和写作(利用检索到的信息,生成连贯的段落)。
|
| 155 |
+
- 高质量数据集构建——我们收集并处理大量人类专家写作的综述论文,构建高质量训练集。同时,我们收集大量研究论文,构建检索数据库。
|
| 156 |
+
- 多方面奖励设计 — 我们精心设计了包含结构、内容和引用的奖励,用于评估综述的质量,在强化学习训练阶段作奖励函数。
|
| 157 |
+
- 多步强化学习训练策略 — 我们提出了一个上下文管理器,以确保在促进有效推理的同时保留必要的信息,并构建了并行环境,维持强化学习训练高效。
|
| 158 |
+
##### 使用与演示案例
|
| 159 |
+
|
| 160 |
+
详见[此处](./demo/minicpm4/SurveyGeneration/README.md)
|
| 161 |
+
|
| 162 |
+
##### 评估
|
| 163 |
+
|
| 164 |
+
| Method | Relevance | Coverage | Depth | Novelty | Avg. | Fact Score |
|
| 165 |
+
|---------------------------------------------|-----------|----------|-------|---------|-------|------------|
|
| 166 |
+
| Naive RAG (driven by G2FT) | 3.25 | 2.95 | 3.35 | 2.60 | 3.04 | 43.68 |
|
| 167 |
+
| AutoSurvey (driven by G2FT) | 3.10 | 3.25 | 3.15 | **3.15**| 3.16 | 46.56 |
|
| 168 |
+
| Webthinker (driven by WTR1-7B) | 3.30 | 3.00 | 2.75 | 2.50 | 2.89 | -- |
|
| 169 |
+
| Webthinker (driven by QwQ-32B) | 3.40 | 3.30 | 3.30 | 2.50 | 3.13 | -- |
|
| 170 |
+
| OpenAI Deep Research (driven by GPT-4o) | 3.50 |**3.95** | 3.55 | 3.00 | **3.50** | -- |
|
| 171 |
+
| MiniCPM4-Survey | 3.45 | 3.70 | **3.85** | 3.00 | **3.50** | **68.73** |
|
| 172 |
+
| *w/o* RL | **3.55** | 3.35 | 3.30 | 2.25 | 3.11 | 50.24 |
|
| 173 |
+
|
| 174 |
+
*GPT-4o 对综述生成系统的性能比较。“G2FT” 代表 Gemini-2.0-Flash-Thinking,“WTR1-7B” 代表 Webthinker-R1-7B。由于 Webthinker 不包括引用功能,OpenAI Deep Research 在导出结果时不提供引用,因此省略了对它们的 FactScore 评估。我们的技术报告中包含评测的详细信息。*
|
| 175 |
+
|
| 176 |
+
#### MiniCPM4-MCP: MCP增强的工具调用
|
| 177 |
+
|
| 178 |
+
MiniCPM4-MCP 是由[清华大学自然语言处理实验室(THUNLP)](https://nlp.csai.tsinghua.edu.cn)、中国人民大学与 [ModelBest](https://modelbest.cn/en) 联合开发的开源本地大语言模型代理,它基于 MiniCPM-4-8B,拥有 80 亿参数。它能够通过 MCP 协议与各种工具和数据资源交互,解决多种真实世界任务。截至目前,MiniCPM4-MCP 已支持:
|
| 179 |
+
|
| 180 |
+
- 涵盖 16 个 MCP 服务器(servers)中工具的使用:这些服务器所包含的工具横跨了办公类、生活类、通讯类、资讯类、工作管理类等.
|
| 181 |
+
|
| 182 |
+
- 单工具使用的能力:可使用符合 MCP 协议的工具进行单一工具的一步或多步调用。
|
| 183 |
+
|
| 184 |
+
- 跨工具组合使用的能力:可组合使用符合 MCP 协议的不同工具。
|
| 185 |
+
|
| 186 |
+
|
| 187 |
+
##### 使用与演示案例
|
| 188 |
+
|
| 189 |
+
详见[此处](./demo/minicpm4/MCP/README.md)
|
| 190 |
+
|
| 191 |
+
##### 评估
|
| 192 |
+
|
| 193 |
+
| MCP 服务器 | | gpt-4o | | | qwen3 | | | minicpm4 | |
|
| 194 |
+
| -------------------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- | -------- |
|
| 195 |
+
| | 函数名正确率 | 参数名正确率 | 数值正确率 | 函数名正确率 | 参数名正确率 | 数值正确率 | 函数名正确率 | 参数名正确率 | 数值正确率 |
|
| 196 |
+
| Airbnb | 89.3 | 67.9 | 53.6 | 92.8 | 60.7 | 50.0 | 96.4 | 67.9 | 50.0 |
|
| 197 |
+
| Amap-Maps | 79.8 | 77.5 | 50.0 | 74.4 | 72.0 | 41.0 | 89.3 | 85.7 | 39.9 |
|
| 198 |
+
| Arxiv-MCP-Server | 85.7 | 85.7 | 85.7 | 81.8 | 54.5 | 50.0 | 57.1 | 57.1 | 52.4 |
|
| 199 |
+
| Calculator | 100.0 | 100.0 | 20.0 | 80.0 | 80.0 | 13.3 | 100.0 | 100.0 | 6.67 |
|
| 200 |
+
| Computor-Control-MCP | 90.0 | 90.0 | 90.0 | 90.0 | 90.0 | 90.0 | 90.0 | 90.0 | 86.7 |
|
| 201 |
+
| Desktop-Commander | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 | 100.0 |
|
| 202 |
+
| Filesystem | 63.5 | 63.5 | 31.3 | 69.7 | 69.7 | 26.0 | 83.3 | 83.3 | 42.7 |
|
| 203 |
+
|Github | 92.0 | 80.0 | 58.0 | 80.5 | 50.0 | 27.7 | 62.8 | 25.7 | 17.1 |
|
| 204 |
+
| Gaode | 71.1 | 55.6 | 17.8 | 68.8 | 46.6 | 24.4 | 68.9 | 46.7 | 15.6 |
|
| 205 |
+
| MCP-Code-Executor | 85.0 | 80.0 | 70.0 | 80.0 | 80.0 | 70.0 | 90.0 | 90.0 | 65.0 |
|
| 206 |
+
| MCP-Docx | 95.8 | 86.7 | 67.1 | 94.9 | 81.6 | 60.1 | 95.1 | 86.6 | 76.1 |
|
| 207 |
+
| PPT | 72.6 | 49.8 | 40.9 | 85.9 | 50.7 | 37.5 | 91.2 | 72.1 | 56.7 |
|
| 208 |
+
| PPTx | 64.2 | 53.7 | 13.4 | 91.0 | 68.6 | 20.9 | 91.0 | 58.2 | 26.9 |
|
| 209 |
+
| Simple-Time-Server | 90.0 | 70.0 | 70.0 | 90.0 | 90.0 | 90.0 | 90.0 | 60.0 | 60.0 |
|
| 210 |
+
| Slack | 100.0 | 90.0 | 70.0 | 100.0 | 100.0 | 65.0 | 100.0 | 100.0 | 100.0 |
|
| 211 |
+
| Whisper | 90.0 | 90.0 | 90.0 | 90.0 | 90.0 | 90.0 | 90.0 | 90.0 | 30.0 |
|
| 212 |
+
| **平均值** | **80.2** | **70.2** | **49.1** | **83.5** | **67.7** | **43.8** | **88.3** | **76.1** | **51.2** |
|
| 213 |
+
|
| 214 |
+
### 模型推理
|
| 215 |
|
| 216 |
+
#### CPM.cu
|
| 217 |
|
| 218 |
+
我们**推荐**使用 [CPM.cu](https://github.com/OpenBMB/CPM.cu) 对 MiniCPM4 模型进行推理。CPM.cu 是面壁开发的一个集合了高效稀疏、投机采样、量化等技术的 CUDA 推理框架,能够完全发挥 MiniCPM4 的效率优势。
|
| 219 |
|
| 220 |
+
你可以通过以下脚本安装 CPM.cu 并进行推理:
|
| 221 |
|
| 222 |
+
```bash
|
| 223 |
+
git clone https://github.com/OpenBMB/CPM.cu.git --recursive
|
| 224 |
+
cd CPM.cu
|
| 225 |
+
python3 setup.py install
|
| 226 |
+
```
|
| 227 |
|
| 228 |
+
你可以通过以下命令进行推理并查看模型的运行速度。
|
|
|
|
| 229 |
|
| 230 |
+
```bash
|
| 231 |
+
python3 tests/long_prompt_gen.py # 生成 prompt.txt
|
| 232 |
+
python3 tests/test_generate.py --prompt-file prompt.txt
|
| 233 |
+
```
|
| 234 |
|
| 235 |
+
更多关于 CPM.cu 的细节,请参考 [CPM.cu 仓库](https://github.com/OpenBMB/CPM.cu)。
|
|
|
|
|
|
|
|
|
|
| 236 |
|
| 237 |
+
#### HuggingFace
|
|
|
|
|
|
|
|
|
|
| 238 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 239 |
```python
|
| 240 |
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 241 |
import torch
|
| 242 |
torch.manual_seed(0)
|
| 243 |
|
| 244 |
+
path = 'openbmb/MiniCPM4-8B'
|
| 245 |
+
device = "cuda"
|
| 246 |
tokenizer = AutoTokenizer.from_pretrained(path)
|
| 247 |
+
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map=device, trust_remote_code=True)
|
| 248 |
+
|
| 249 |
+
# User can directly use the chat interface
|
| 250 |
+
# responds, history = model.chat(tokenizer, "Write an article about Artificial Intelligence.", temperature=0.7, top_p=0.7)
|
| 251 |
+
# print(responds)
|
| 252 |
+
|
| 253 |
+
# User can also use the generate interface
|
| 254 |
+
messages = [
|
| 255 |
+
{"role": "user", "content": "Write an article about Artificial Intelligence."},
|
| 256 |
+
]
|
| 257 |
+
prompt_text = tokenizer.apply_chat_template(
|
| 258 |
+
messages,
|
| 259 |
+
tokenize=False,
|
| 260 |
+
add_generation_prompt=True,
|
| 261 |
+
)
|
| 262 |
+
model_inputs = tokenizer([prompt_text], return_tensors="pt").to(device)
|
| 263 |
+
|
| 264 |
+
model_outputs = model.generate(
|
| 265 |
+
**model_inputs,
|
| 266 |
+
max_new_tokens=1024,
|
| 267 |
+
top_p=0.7,
|
| 268 |
+
temperature=0.7
|
| 269 |
+
)
|
| 270 |
+
output_token_ids = [
|
| 271 |
+
model_outputs[i][len(model_inputs[i]):] for i in range(len(model_inputs['input_ids']))
|
| 272 |
+
]
|
| 273 |
+
|
| 274 |
+
responses = tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0]
|
| 275 |
+
print(responses)
|
| 276 |
```
|
| 277 |
|
| 278 |
+
本模型支持稀疏注意力机制 InfLLM v2,可高效处理长序列推理。如需启用该功能,请先安装依赖库 [infllmv2_cuda_impl](https://github.com/OpenBMB/infllmv2_cuda_impl)
|
|
|
|
|
|
|
| 279 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 280 |
|
| 281 |
+
运行以下命令即可安装:
|
|
|
|
|
|
|
|
|
|
| 282 |
|
| 283 |
+
```bash
|
| 284 |
+
git clone -b feature_infer https://github.com/OpenBMB/infllmv2_cuda_impl.git
|
| 285 |
+
cd infllmv2_cuda_impl
|
| 286 |
+
git submodule update --init --recursive
|
| 287 |
+
pip install -e . # or python setup.py install
|
| 288 |
+
```
|
| 289 |
|
| 290 |
+
启用 InfLLM v2 需在 `config.json` 配置文件中添加 `sparse_config` 字段:
|
| 291 |
+
|
| 292 |
+
```json
|
| 293 |
+
{
|
| 294 |
+
...,
|
| 295 |
+
"sparse_config": {
|
| 296 |
+
"kernel_size": 32,
|
| 297 |
+
"kernel_stride": 16,
|
| 298 |
+
"init_blocks": 1,
|
| 299 |
+
"block_size": 64,
|
| 300 |
+
"window_size": 2048,
|
| 301 |
+
"topk": 64,
|
| 302 |
+
"use_nope": false,
|
| 303 |
+
"dense_len": 8192
|
| 304 |
+
}
|
| 305 |
+
}
|
| 306 |
+
```
|
| 307 |
|
| 308 |
+
这些参数控制 InfLLM v2 的行为:
|
|
|
|
|
|
|
|
|
|
| 309 |
|
| 310 |
+
* `kernel_size`(默认值:32):语义核的大小。
|
| 311 |
+
* `kernel_stride`(默认值:16):相邻语义核的步长。
|
| 312 |
+
* `init_blocks`(默认值:1):每个 query token 关注的初始的块数量,用于确保关注序列开头部分。
|
| 313 |
+
* `block_size`(默认值:64):key-value blocks 的块大小。
|
| 314 |
+
* `window_size`(默认值:2048):局部滑动窗口大小。
|
| 315 |
+
* `topk`(默认值:64):每个 token 仅与最相关的 top-k 个 key-value blocks 计算注意力。
|
| 316 |
+
* `use_nope`(默认值:false):是否在块选择中使用NOPE技术以提升性能。
|
| 317 |
+
* `dense_len`(默认值:8192):稀疏注意力对短序列收益有限,当 token 长度低于此阈值时自动切换为标准注意力。设为 `-1` 则强制始终使用稀疏注意力。
|
| 318 |
|
| 319 |
+
Minicpm4 原生支持 32,768 tokens 的上下文长度。若对话总长度(输入 + 输出)远超此限制,建议通过 RoPE 缩放技术扩展上下文。我们已验证通过调整 LongRoPE 因子,模型可稳定支持 131,072 tokens 的超长上下文。
|
| 320 |
|
| 321 |
+
修改方法:在 `config.json` 文件中调整 `rope_scaling` 字段参数即可。
|
|
|
|
| 322 |
|
| 323 |
+
```json
|
| 324 |
+
{
|
| 325 |
+
...,
|
| 326 |
+
"rope_scaling": {
|
| 327 |
+
"rope_type": "longrope",
|
| 328 |
+
"long_factor": [0.9977997200264581, 1.014658295992452, 1.0349680404997148, 1.059429246056193, 1.0888815016813513, 1.1243301355211495, 1.166977103606075, 1.2182568066927284, 1.2798772354275727, 1.3538666751582975, 1.4426259039919596, 1.5489853358570191, 1.6762658237220625, 1.8283407612492941, 2.0096956085876183, 2.225478927469756, 2.4815
|
|
|