MiniCPM5-1B (GGUF Quantizations)

This repository contains custom GGUF format quantizations of the openbmb/MiniCPM5-1B model.

MiniCPM5-1B is a highly capable 1-billion parameter Transformer built for on-device, local deployment, and resource-constrained scenarios. It utilizes a standard LlamaForCausalLM architecture, features hybrid reasoning (built-in <think> tokens), and supports a massive 131k context window.

📦 Available Files and Quantizations

These models were quantized specifically for high-efficiency CPU/Edge inference using the llama.cpp framework.

Filename Format Size Description
minicpm5-1b-Q4_K_M.gguf Q4_K_M 657 MB Excellent balance of performance and size. (Recommended for 4GB RAM/Mobile)
minicpm5-1b-Q5_K_M.gguf Q5_K_M 751 MB Higher accuracy, slight increase in size.
minicpm5-1b-Q6_K.gguf Q6_K 851 MB Near-perfect fidelity to the base model.
minicpm5-1b-Q8_0.gguf Q8_0 1.1 GB Maximum quantized quality; fast loading.
minicpm5-1b-f16.gguf F16 2.1 GB Unquantized master weight container.

🚀 Quick Start with llama.cpp

Because MiniCPM5-1B uses standard Llama architecture, it is fully supported by llama.cpp out of the box. No custom forks or kernels are required.

1. Interactive CLI

To run the model directly in your terminal using CPU threads:

./llama-cli -m minicpm5-1b-Q4_K_M.gguf -p "Artificial intelligence and local model deployment are transforming technology because" -n 256 -t 4
Downloads last month
3,607
GGUF
Model size
1B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Abiray/MiniCPM5-1B-GGUF

Quantized
(19)
this model