MiniCPM4: Ultra-Efficient LLMs on End Devices
Paper β’ 2506.07900 β’ Published β’ 99
How to use litert-community/MiniCPM5-1B with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
β³ Upcoming β The LiteRT (
.tflite) build of MiniCPM5-1B is on the way. Model weights are not available in this repository yet. Please follow this repo to be notified when they land.
This repository will host the LiteRT (formerly TensorFlow Lite) version of MiniCPM5-1B, optimized for fully on-device inference on mobile and edge hardware.
MiniCPM5-1B is the first model in the MiniCPM5 series from OpenBMB. It is a dense 1B-parameter Transformer built specifically for on-device, local, and resource-constrained deployment, while reaching 1B-class open-source SOTA in its size class.
<think> template (enable_thinking).| Item | Value |
|---|---|
| Type | Causal Language Model |
| Architecture | Standard LlamaForCausalLM |
| Parameters | 1,080,632,832 (~1B) |
| Non-Embedding Parameters | 679,552,512 |
| Layers | 24 |
| Attention Heads (GQA) | 16 (Q) / 2 (KV) |
| Context Length | 131,072 |
Released under the Apache-2.0 License, consistent with the upstream openbmb/MiniCPM5-1B.
@article{minicpm4,
title={MiniCPM4: Ultra-efficient LLMs on end devices},
author={MiniCPM, Team},
journal={arXiv preprint arXiv:2506.07900},
year={2025}
}
Base model
openbmb/MiniCPM5-1B