Instructions to use openbmb/BitCPM-CANN-0.5B-unquantized with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use openbmb/BitCPM-CANN-0.5B-unquantized with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="openbmb/BitCPM-CANN-0.5B-unquantized", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("openbmb/BitCPM-CANN-0.5B-unquantized", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use openbmb/BitCPM-CANN-0.5B-unquantized with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "openbmb/BitCPM-CANN-0.5B-unquantized"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/BitCPM-CANN-0.5B-unquantized",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/openbmb/BitCPM-CANN-0.5B-unquantized

SGLang

How to use openbmb/BitCPM-CANN-0.5B-unquantized with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "openbmb/BitCPM-CANN-0.5B-unquantized" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/BitCPM-CANN-0.5B-unquantized",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "openbmb/BitCPM-CANN-0.5B-unquantized" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/BitCPM-CANN-0.5B-unquantized",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use openbmb/BitCPM-CANN-0.5B-unquantized with Docker Model Runner:
```
docker model run hf.co/openbmb/BitCPM-CANN-0.5B-unquantized
```

BitCPM-CANN-0.5B-unquantized / example /run.sh

guanwenyu1995

Update example naming from BitCPM4 to BitCPM

63e0b2d verified about 22 hours ago

raw

history blame contribute delete

1.13 kB

	#!/bin/bash

	MODEL_PATH="/model/BitCPM-CANN-1B-unquantized"
	DATA_PATH="/dataset/c4-pro/data/000_1_7.parquet"
	OUTPUT_DIR="./output"
	DS_CONFIG="./ds_config_z2.json"

	NUM_GPUS=8
	BATCH_SIZE_PER_GPU=8
	GRAD_ACCUM_STEPS=8
	MAX_SEQ_LENGTH=1024

	export ASCEND_RT_VISIBLE_DEVICES=8,9,10,11,12,13,14,15
	export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
	export DS_SKIP_CUDA_CHECK=1
	torchrun --nproc_per_node=$NUM_GPUS train.py \
	--model_name_or_path $MODEL_PATH \
	--data_path $DATA_PATH \
	--max_seq_length $MAX_SEQ_LENGTH \
	--output_dir $OUTPUT_DIR \
	--per_device_train_batch_size $BATCH_SIZE_PER_GPU \
	--gradient_accumulation_steps $GRAD_ACCUM_STEPS \
	--max_steps 100 \
	--learning_rate 4e-5 \
	--lr_scheduler_type cosine \
	--warmup_ratio 0.1 \
	--weight_decay 1e-2 \
	--logging_steps 2 \
	--save_steps 500 \
	--save_total_limit 3 \
	--bf16 \
	--deepspeed $DS_CONFIG \
	--gradient_checkpointing \
	--seed 42 \
	--dataloader_num_workers 4 \
	--report_to tensorboard \
	--logging_dir /data/tensorboard/pretrain \
	--gradient_checkpointing_kwargs '{"use_reentrant": false}'