nielsr HF Staff commited on
Commit
7d0f057
·
verified ·
1 Parent(s): b978ad4

Improve model card: Add paper/project links, abstract, and code-generation tag

Browse files

This PR significantly improves the model card for `openbmb/MiniCPM4-8B` by:

- Adding a direct link to the official Hugging Face paper page.
- Including a link to the Hugging Face collection acting as the project page.
- Adding a dedicated "Paper Abstract" section with the model's abstract.
- Adding the `code-generation` tag to the metadata, reflecting the model's capabilities as evidenced by its evaluations and features (e.g., "代码解释器").

These additions enhance the model's discoverability and provide more comprehensive and structured information for users.

Files changed (1) hide show
  1. README.md +11 -3
README.md CHANGED
@@ -1,23 +1,31 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - zh
5
  - en
6
- pipeline_tag: text-generation
7
  library_name: transformers
 
 
 
 
8
  ---
 
9
  <div align="center">
10
  <img src="https://github.com/OpenBMB/MiniCPM/blob/main/assets/minicpm_logo.png?raw=true" width="500em" ></img>
11
  </div>
12
 
13
  <p align="center">
14
  <a href="https://github.com/OpenBMB/MiniCPM/" target="_blank">GitHub Repo</a> |
 
 
15
  <a href="https://github.com/OpenBMB/MiniCPM/tree/main/report/MiniCPM_4_Technical_Report.pdf" target="_blank">Technical Report</a>
16
  </p>
17
  <p align="center">
18
  👋 Join us on <a href="https://discord.gg/3cGQn9b3YM" target="_blank">Discord</a> and <a href="https://github.com/OpenBMB/MiniCPM/blob/main/assets/wechat.jpg" target="_blank">WeChat</a>
19
  </p>
20
 
 
 
 
21
  ## What's New
22
  - [2025.06.06] **MiniCPM4** series are released! This model achieves ultimate efficiency improvements while maintaining optimal performance at the same scale! It can achieve over 5x generation acceleration on typical end-side chips! You can find technical report [here](https://github.com/OpenBMB/MiniCPM/tree/main/report/MiniCPM_4_Technical_Report.pdf).🔥🔥🔥
23
 
@@ -314,4 +322,4 @@ MiniCPM4 is pre-trained on 32K long texts and achieves length extension through
314
  author={MiniCPM Team},
315
  year={2025}
316
  }
317
- ```
 
1
  ---
 
2
  language:
3
  - zh
4
  - en
 
5
  library_name: transformers
6
+ license: apache-2.0
7
+ pipeline_tag: text-generation
8
+ tags:
9
+ - code-generation
10
  ---
11
+
12
  <div align="center">
13
  <img src="https://github.com/OpenBMB/MiniCPM/blob/main/assets/minicpm_logo.png?raw=true" width="500em" ></img>
14
  </div>
15
 
16
  <p align="center">
17
  <a href="https://github.com/OpenBMB/MiniCPM/" target="_blank">GitHub Repo</a> |
18
+ <a href="https://huggingface.co/papers/2506.07900" target="_blank">Paper</a> |
19
+ <a href="https://huggingface.co/collections/openbmb/minicpm4-6841ab29d180257e940baa9b" target="_blank">Project Page</a> |
20
  <a href="https://github.com/OpenBMB/MiniCPM/tree/main/report/MiniCPM_4_Technical_Report.pdf" target="_blank">Technical Report</a>
21
  </p>
22
  <p align="center">
23
  👋 Join us on <a href="https://discord.gg/3cGQn9b3YM" target="_blank">Discord</a> and <a href="https://github.com/OpenBMB/MiniCPM/blob/main/assets/wechat.jpg" target="_blank">WeChat</a>
24
  </p>
25
 
26
+ ## Paper Abstract
27
+ This paper introduces MiniCPM4, a highly efficient large language model (LLM) designed explicitly for end-side devices. We achieve this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems. Specifically, in terms of model architecture, we propose InfLLM v2, a trainable sparse attention mechanism that accelerates both prefilling and decoding phases for long-context processing. Regarding training data, we propose UltraClean, an efficient and accurate pre-training data filtering and generation strategy, and UltraChat v2, a comprehensive supervised fine-tuning dataset. These datasets enable satisfactory model performance to be achieved using just 8 trillion training tokens. Regarding training algorithms, we propose ModelTunnel v2 for efficient pre-training strategy search, and improve existing post-training methods by introducing chunk-wise rollout for load-balanced reinforcement learning and data-efficient tenary LLM, BitCPM. Regarding inference systems, we propose this http URL that integrates sparse attention, model quantization, and speculative sampling to achieve efficient prefilling and decoding. To meet diverse on-device requirements, MiniCPM4 is available in two versions, with 0.5B and 8B parameters, respectively. Furthermore, we construct a hybrid reasoning model, MiniCPM4.1, which can be used in both deep reasoning mode and non-reasoning mode. Evaluation results demonstrate that MiniCPM4 and MiniCPM4.1 outperform similar-sized open-source models across benchmarks, with the 8B variants showing significant speed improvements on long sequence understanding and generation.
28
+
29
  ## What's New
30
  - [2025.06.06] **MiniCPM4** series are released! This model achieves ultimate efficiency improvements while maintaining optimal performance at the same scale! It can achieve over 5x generation acceleration on typical end-side chips! You can find technical report [here](https://github.com/OpenBMB/MiniCPM/tree/main/report/MiniCPM_4_Technical_Report.pdf).🔥🔥🔥
31
 
 
322
  author={MiniCPM Team},
323
  year={2025}
324
  }
325
+ ```