Abiray/BitCPM4-CANN-3B-GGUF

This repository contains quantized GGUF formats of the openbmb/BitCPM4-CANN-3B model, heavily optimized for local inference using llama.cpp, text-generation-webui, LM Studio, Ollama, and other compatible backend frameworks.

Model Information

Available Files & Hardware Compatibility

The following quantization formats are available. Choose the one that best fits your system's VRAM/RAM capacity and desired inference speed.

Filename Quant Type File Size Description
BitCPM4-CANN-3B-Q8_0.gguf 8-bit 3.83 GB Highest quality, almost indistinguishable from the unquantized base model. Best for machines with ample memory (>5GB VRAM/RAM).
BitCPM4-CANN-3B-Q6_K.gguf 6-bit 2.96 GB Excellent quality with minimal degradation. Recommended for strong standard hardware.
BitCPM4-CANN-3B-Q5_K_M.gguf 5-bit 2.56 GB Great balance of speed, file size, and reasoning capability.
BitCPM4-CANN-3B-Q5_K_S.gguf 5-bit 2.51 GB Slightly faster and smaller than Q5_K_M, with a very minor loss in reasoning quality.
BitCPM4-CANN-3B-Q4_K_M.gguf 4-bit 2.19 GB Recommended. The standard "sweet spot" choice for balancing memory efficiency and intelligence for most users.
BitCPM4-CANN-3B-Q4_K_S.gguf 4-bit 2.09 GB Fast and small. Good for lower-end hardware constraints where memory is tight.
BitCPM4-CANN-3B-Q3_K_M.gguf 3-bit 1.79 GB Lowest quality. Use only if heavily constrained by memory (requires <3GB RAM).

How to Run

Using llama.cpp (Command Line)

If you have compiled llama.cpp, you can run the model directly from your terminal. Replace the filename with the specific version you downloaded:

./llama-cli \
  -m BitCPM4-CANN-3B-Q4_K_M.gguf \
  -p "Explain the concept of artificial intelligence to a five-year-old." \
  -n 256 \
  -c 2048 \
  --temp 0.7
Downloads last month
-
GGUF
Model size
4B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Abiray/BitCPM4-CANN-3B-GGUF

Quantized
(2)
this model

Collection including Abiray/BitCPM4-CANN-3B-GGUF