Instructions to use openbmb/BitCPM-CANN-1B-unquantized with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use openbmb/BitCPM-CANN-1B-unquantized with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="openbmb/BitCPM-CANN-1B-unquantized", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("openbmb/BitCPM-CANN-1B-unquantized", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("openbmb/BitCPM-CANN-1B-unquantized", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use openbmb/BitCPM-CANN-1B-unquantized with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "openbmb/BitCPM-CANN-1B-unquantized"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/BitCPM-CANN-1B-unquantized",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/openbmb/BitCPM-CANN-1B-unquantized

SGLang

How to use openbmb/BitCPM-CANN-1B-unquantized with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "openbmb/BitCPM-CANN-1B-unquantized" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/BitCPM-CANN-1B-unquantized",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "openbmb/BitCPM-CANN-1B-unquantized" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/BitCPM-CANN-1B-unquantized",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use openbmb/BitCPM-CANN-1B-unquantized with Docker Model Runner:
```
docker model run hf.co/openbmb/BitCPM-CANN-1B-unquantized
```

BitCPM-CANN-1B-unquantized

File size: 6,336 Bytes

8a5049d
 
 
 
 
 
 
 
 
 
 
 
 
 
68beec3
8a5049d
 
 
 
 
cc733b5
8a5049d
b7d219e
8a5049d
b7d219e
8a5049d
cc733b5
8a5049d
cc733b5
8a5049d
cc733b5
8a5049d
b7d219e
e99b139
cc733b5
 
e99b139
cc733b5
e99b139
cc733b5
 
 
e99b139
cc733b5
 
 
8a5049d
cc733b5
31610ea
cc733b5
8a5049d
 
 
 
b7d219e
8a5049d
 
 
 
 
 
 
cc733b5
 
8a5049d
e99b139
cc733b5
e99b139
cc733b5
e99b139
 
8a5049d
 
 
 
 
e99b139
 
b7d219e
e99b139
cc733b5
e99b139
8a5049d
 
b7d219e
8a5049d
 
cc733b5
8a5049d
cc733b5
8a5049d
 
 
 
b7d219e
8a5049d
 
e99b139
b7d219e
e99b139
cc733b5
 
b7d219e
 
 
 
e99b139
8a5049d
b7d219e
8a5049d
b7d219e
 
e99b139
8a5049d
b7d219e
e99b139
8a5049d
 
31610ea
8a5049d
b7d219e
8a5049d

---
license: apache-2.0
language:
- zh
- en
pipeline_tag: text-generation
library_name: transformers
---
<div align="center">
<img src="https://github.com/OpenBMB/MiniCPM/blob/main/assets/minicpm_logo.png?raw=true" width="500em" ></img> 
</div>

<p align="center">
<a href="https://github.com/OpenBMB/MiniCPM/" target="_blank">GitHub Repo</a> |
<a href="https://github.com/OpenBMB/MiniCPM/blob/main/docs/BitCPM_CANN.pdf" target="_blank">Technical Report</a> 
</p>
<p align="center">
👋 Join us on <a href="https://discord.gg/3cGQn9b3YM" target="_blank">Discord</a> and <a href="https://github.com/OpenBMB/MiniCPM/blob/main/assets/wechat.jpg" target="_blank">WeChat</a>
</p>

## Overview

BitCPM-CANN-1B-unquantized is the **unquantized QAT (Quantization-Aware Training) checkpoint** of BitCPM-CANN-1B, designed for **continued pre-training and fine-tuning**. It preserves full-precision latent weights with ternary fake quantizers (weights → {-1, 0, 1} with group-wise scaling, trained via STE) defined in `modeling.py`, enabling the model to keep learning under quantization constraints. For technical details, see our [Technical Report](https://github.com/OpenBMB/MiniCPM/blob/main/docs/BitCPM_CANN.pdf).

> ⚠️ **This model is NOT for direct inference.** For inference, use the pseudo-quantized version: [openbmb/BitCPM-CANN-1B](https://huggingface.co/openbmb/BitCPM-CANN-1B).

## Continued Pre-training & Fine-tuning

The **only requirement** is that the forward pass must go through the bundled `modeling.py` (which contains the ternary fake quantizer). Load with `trust_remote_code=True` and do NOT replace or bypass the model's forward logic.

### Option 1: DeepSpeed (Recommended)

We provide ready-to-use training scripts in the [example](https://huggingface.co/openbmb/BitCPM-CANN-1B-unquantized/tree/main/example) directory (using the 1B model as an example):

- **Continued pre-training**: `example/run.sh` + `example/train.py`
- **SFT (Supervised Fine-tuning)**: `example/run_sft.sh` + `example/train_sft.py`

Quick start:

```bash
# Continued pre-training
cd example && bash run.sh

# Supervised fine-tuning
cd example && bash run_sft.sh
```

### Option 2: HuggingFace-compatible Frameworks

Any framework that supports HuggingFace model loading with custom code can be used, such as **LLaMA Factory**, **HuggingFace Trainer**, etc. The key is to ensure `trust_remote_code=True`:

```python
from transformers import AutoModelForCausalLM, AutoTokenizer

path = 'openbmb/BitCPM-CANN-1B-unquantized'
tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    path,
    torch_dtype=torch.bfloat16,
    trust_remote_code=True
)

# Use with your preferred framework (LLaMA Factory, HF Trainer, etc.)
# The ternary fake quantizer in modeling.py is applied automatically during forward pass.
```

## Post-Training Conversion

After training, use `qat-convert.py` to fuse the fake quantizer and produce inference-ready pseudo-quantized weights:

```bash
python qat-convert.py \
    --input_bin <path-to-finetuned-pytorch.bin> \
    --output <path-to-output-pseudo-quantized-pytorch.bin> \
    --quant_type ternary \
    --group_size -1
```

The converted model can be loaded for inference in the same way as [openbmb/BitCPM-CANN-1B](https://huggingface.co/openbmb/BitCPM-CANN-1B)—no special quantization libraries required.

## Workflow

```
┌─────────────────────────────────┐
│  BitCPM-CANN-1B-unquantized  │   ← This model (QAT checkpoint + fake quantizer in modeling.py)
└───────────────┬─────────────────┘
                │
                ▼  Train (DeepSpeed / LLaMA Factory / HF Trainer / ...)
┌─────────────────────────────────┐
│   Fine-tuned checkpoint          │   ← Still contains un-fused QAT parameters
└───────────────┬─────────────────┘
                │
                ▼  python qat-convert.py --quant_type ternary --group_size -1
┌─────────────────────────────────┐
│  Pseudo-quantized model          │   ← Ready for inference (same format as BitCPM-CANN-1B)
└─────────────────────────────────┘
```

## BitCPM-CANN Model Family

| Model | HuggingFace (Inference) | HuggingFace (Fine-tuning) |
|-------|-------------------------|---------------------------|
| BitCPM-CANN-0.5B | [openbmb/BitCPM-CANN-0.5B](https://huggingface.co/openbmb/BitCPM-CANN-0.5B) | [openbmb/BitCPM-CANN-0.5B-unquantized](https://huggingface.co/openbmb/BitCPM-CANN-0.5B-unquantized) |
| BitCPM-CANN-1B | [openbmb/BitCPM-CANN-1B](https://huggingface.co/openbmb/BitCPM-CANN-1B) | [openbmb/BitCPM-CANN-1B-unquantized](https://huggingface.co/openbmb/BitCPM-CANN-1B-unquantized) |
| BitCPM-CANN-3B | [openbmb/BitCPM-CANN-3B](https://huggingface.co/openbmb/BitCPM-CANN-3B) | [openbmb/BitCPM-CANN-3B-unquantized](https://huggingface.co/openbmb/BitCPM-CANN-3B-unquantized) |
| BitCPM-CANN-8B | [openbmb/BitCPM-CANN-8B](https://huggingface.co/openbmb/BitCPM-CANN-8B) | [openbmb/BitCPM-CANN-8B-unquantized](https://huggingface.co/openbmb/BitCPM-CANN-8B-unquantized) |

## Statement
- As a language model, BitCPM-CANN generates content by learning from a vast amount of text. 
- However, it does not possess the ability to comprehend or express personal opinions or value judgments. 
- Any content generated by BitCPM-CANN does not represent the viewpoints or positions of the model developers. 
- Therefore, when using content generated by BitCPM-CANN, users should take full responsibility for evaluating and verifying it on their own.

## LICENSE
- This repository and BitCPM-CANN models are released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License. 

## Citation
- Please cite our technical report if you find our work valuable.

```bibtex
@article{bitcpmcann,
  title={{BitCPM-CANN}: Native 1.58-Bit Large Language Model Training on Ascend NPU},
  author={BitCPM Team},
  year={2026}
}
```