How to use from the
Use from the
llama-cpp-python library
# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="Abiray/BitCPM4-CANN-1B-GGUF",
	filename="",
)
llm.create_chat_completion(
	messages = "No input example has been defined for this model task."
)

Abiray/BitCPM4-CANN-1B-GGUF

This repository contains quantized GGUF formats of the openbmb/BitCPM4-CANN-1B model, heavily optimized for local inference using llama.cpp, text-generation-webui, LM Studio, Ollama, and other compatible backend frameworks.

Model Information

Available Files & Hardware Compatibility

The following quantization formats are available. Because this is a 1-Billion parameter model, it is highly efficient and can easily run on consumer CPUs, ultra-low-end hardware, and mobile devices.

Filename Quant Type File Size Description
BitCPM4-CANN-1B-Q8_0.gguf 8-bit 1.73 GB Highest quality, almost indistinguishable from the unquantized base model. Very fast inference.
BitCPM4-CANN-1B-Q6_K.gguf 6-bit 1.33 GB Excellent quality with near-zero noticeable degradation. Highly recommended.
BitCPM4-CANN-1B-Q5_K_M.gguf 5-bit 1.16 GB Great balance of file size, text generation speed, and logic retention.
BitCPM4-CANN-1B-Q5_K_S.gguf 5-bit 1.14 GB Minor variant of Q5_K_M optimized slightly more for size.
BitCPM4-CANN-1B-Q4_K_M.gguf 4-bit 1.00 GB Recommended. The ideal sweet-spot for 4-bit formats, striking an incredible performance-to-size ratio.
BitCPM4-CANN-1B-Q4_K_S.gguf 4-bit 958 MB Extremely small and fast. Drops below the 1GB mark, making it perfect for lightweight deployments.
BitCPM4-CANN-1B-Q3_K_M.gguf 3-bit 824 MB Maximum compression. Use only if working under severe memory bottlenecks.

How to Run

Using llama.cpp (Command Line)

If you have compiled llama.cpp, you can run the model directly from your terminal. Replace the filename with the specific version you downloaded:

./llama-cli \
  -m BitCPM4-CANN-1B-Q4_K_M.gguf \
  -p "Explain the concept of artificial intelligence to a five-year-old." \
  -n 256 \
  -c 2048 \
  --temp 0.7
Downloads last month
125
GGUF
Model size
2B params
Architecture
llama
Hardware compatibility
Log In to add your hardware

3-bit

4-bit

5-bit

6-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Abiray/BitCPM4-CANN-1B-GGUF

Quantized
(2)
this model

Collection including Abiray/BitCPM4-CANN-1B-GGUF