Instructions to use openbmb/BitCPM-CANN-1B-unquantized with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use openbmb/BitCPM-CANN-1B-unquantized with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="openbmb/BitCPM-CANN-1B-unquantized", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("openbmb/BitCPM-CANN-1B-unquantized", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("openbmb/BitCPM-CANN-1B-unquantized", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use openbmb/BitCPM-CANN-1B-unquantized with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "openbmb/BitCPM-CANN-1B-unquantized"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/BitCPM-CANN-1B-unquantized",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/openbmb/BitCPM-CANN-1B-unquantized

SGLang

How to use openbmb/BitCPM-CANN-1B-unquantized with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "openbmb/BitCPM-CANN-1B-unquantized" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/BitCPM-CANN-1B-unquantized",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "openbmb/BitCPM-CANN-1B-unquantized" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/BitCPM-CANN-1B-unquantized",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use openbmb/BitCPM-CANN-1B-unquantized with Docker Model Runner:
```
docker model run hf.co/openbmb/BitCPM-CANN-1B-unquantized
```

BitCPM-CANN-1B-unquantized / example /README.md

guanwenyu1995

Update example naming from BitCPM4 to BitCPM

309e7bb verified about 14 hours ago

preview code

raw

history blame contribute delete

3.36 kB

	# BitCPM Training Example

	This project provides scripts for continue pretraining (CPT) and supervised fine-tuning (SFT) of BitCPM-CANN-1B-unquantized.

	## File Description

	CPT and SFT each have a pair of scripts (training script + launch script) and share DeepSpeed configuration files:

	\| File \| Description \|
	\| --- \| --- \|
	\| `train.py` \| Continue pretrain script based on HuggingFace Trainer + DeepSpeed \|
	\| `run.sh` \| Launch script for CPT with hyperparameter configuration \|
	\| `train_sft.py` \| Supervised fine-tuning script based on HuggingFace Trainer + DeepSpeed \|
	\| `run_sft.sh` \| Launch script for SFT with hyperparameter configuration \|
	\| `ds_config.json` \| DeepSpeed ZeRO-3 configuration (with CPU offload) \|
	\| `ds_config_z2.json` \| DeepSpeed ZeRO-2 configuration (used by default) \|
	\| `requirements.txt` \| Python dependency list \|

	## Environment Setup

	### Docker Image

	Use the following Huawei NPU image:

	```
	swr.cn-south-1.myhuaweicloud.com/ascendhub/mindspeed-llm:openeuler22.03-mindspeed-llm-2.3.0-a3-arm
	```

	Other Huawei NPU images may also work but have not been fully tested. For GPU environments, you can skip the Docker image and just install `requirements.txt` directly.

	### Install Dependencies

	After entering the container, install the Python dependencies:

	```bash
	pip install -r requirements.txt
	```

	## Continue Pretrain (CPT)

	### Dataset

	The test dataset used is [C4-Pro](https://huggingface.co/datasets/gair-prox/c4-pro), stored in parquet format after downloading.

	### Usage

	Modify the path configuration in `run.sh`:

	```bash
	MODEL_PATH="/path/to/BitCPM-CANN-1B-unquantized/"
	DATA_PATH="/path/to/c4-pro/data/your_file.parquet"
	```

	Then start training:

	```bash
	bash run.sh
	```

	## Supervised Fine-Tuning (SFT)

	### Dataset

	The test dataset used is [UltraChat 200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k), stored in parquet format after downloading.

	### Usage

	Modify the path configuration in `run_sft.sh`:

	```bash
	MODEL_PATH="/path/to/BitCPM-CANN-1B-unquantized/"
	DATA_PATH="/path/to/ultrachat_200k/data/your_file.parquet"
	```

	Then start training:

	```bash
	bash run_sft.sh
	```

	## Training Results Reference

	> Note: BitCPM has its own training dataset and data mixture. It is expected that the loss continues to decrease when training on open-source datasets.

	Below are the loss curves from smoke tests on GPU and NPU for both CPT and SFT tasks. The results are highly consistent across GPU and NPU, indicating that users can continue pre-training or fine-tuning on various compute devices:

	\| \| GPU \| NPU \|
	\| --- \| --- \| --- \|
	\| CPT \| ![GPU Pretrain Loss](gpu_pretrain_loss.png) \| ![NPU Pretrain Loss](npu_pretrain_loss.png) \|
	\| SFT \| ![GPU SFT Loss](gpu_sft_loss.png) \| ![NPU SFT Loss](npu_sft_loss.png) \|

	Training log CSV files (corresponding to the loss curves above):

	\| CSV File \| Corresponding Loss Curve \|
	\| --- \| --- \|
	\| [gpu_pretrain.csv](gpu_pretrain.csv) \| GPU CPT \|
	\| [npu_pretrain.csv](npu_pretrain.csv) \| NPU CPT \|
	\| [gpu_sft.csv](gpu_sft.csv) \| GPU SFT \|
	\| [npu_sft.csv](npu_sft.csv) \| NPU SFT \|

	---

	These scripts provide a convenient, ready-to-use toolkit for QAT-aware continued pre-training and fine-tuning of BitCPM-CANN models, so you can quickly adapt the model to your own data and tasks while preserving ternary quantization constraints.