Abiray/BitCPM4-CANN-8B-GGUF

This repository contains quantized GGUF formats of the openbmb/BitCPM4-CANN-8B model, heavily optimized for local inference using llama.cpp, text-generation-webui, LM Studio, Ollama, and other compatible backend frameworks.

Model Information

Available Files & Hardware Compatibility

The following quantization formats are available. As an 8-Billion parameter model, these variants offer excellent reasoning, coding, and comprehension capabilities while remaining small enough to run entirely on consumer GPUs (like the RTX 3060/4060 series) or modern system RAM.

Filename Quant Type File Size Description
BitCPM4-CANN-8B-Q8_0.gguf 8-bit 8.70 GB Extremely high fidelity. Practically identical to the unquantized base model. Recommended if you have 12GB+ of VRAM/RAM.
BitCPM4-CANN-8B-Q6_K.gguf 6-bit 6.72 GB Exceptional performance with near-zero degradation. Highly stable for complex instructions.
BitCPM4-CANN-8B-Q5_K_M.gguf 5-bit 5.81 GB Highly recommended balance of file size, text generation speed, and response accuracy.
BitCPM4-CANN-8B-Q5_K_S.gguf 5-bit 5.67 GB A slightly lighter version of the 5-bit intermediate format, maximizing speed over minor edge-case logic.
BitCPM4-CANN-8B-Q4_K_M.gguf 4-bit 4.97 GB Recommended. The absolute sweet spot for 8B models. Keeps the model under 5GB while preserving most of its native intelligence.
BitCPM4-CANN-8B-Q4_K_S.gguf 4-bit 4.72 GB Optimized heavily for speed and low memory impact, perfect for constrained environments.
BitCPM4-CANN-8B-Q3_K_M.gguf 3-bit 4.02 GB Maximum compression. Fits easily into lower-tier hardware, though some fallback in complex logic may occur.

How to Run

Using llama.cpp (Command Line)

If you have compiled llama.cpp, you can run the model directly from your terminal. Replace the filename with the specific version you downloaded:

./llama-cli \
  -m BitCPM4-CANN-8B-Q4_K_M.gguf \
  -p "Explain the concept of artificial intelligence to a five-year-old." \
  -n 256 \
  -c 2048 \
  --temp 0.7
Downloads last month
-
GGUF
Model size
8B params
Architecture
minicpm
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Abiray/BitCPM4-CANN-8B-GGUF

Quantized
(3)
this model

Collection including Abiray/BitCPM4-CANN-8B-GGUF