Instructions to use openbmb/cpm-bee-10b with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use openbmb/cpm-bee-10b with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="openbmb/cpm-bee-10b", trust_remote_code=True)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("openbmb/cpm-bee-10b", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use openbmb/cpm-bee-10b with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "openbmb/cpm-bee-10b" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/cpm-bee-10b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/openbmb/cpm-bee-10b
- SGLang
How to use openbmb/cpm-bee-10b with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "openbmb/cpm-bee-10b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/cpm-bee-10b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "openbmb/cpm-bee-10b" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "openbmb/cpm-bee-10b", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use openbmb/cpm-bee-10b with Docker Model Runner:
docker model run hf.co/openbmb/cpm-bee-10b
Error while running it with huggingface transformers
Error type 1 - Using code in your github forked version to install transformers
/CPMBee-fork-transformer/transformers/src/transformers/models/cpmbee/modeling_cpmbee.py:572 in forward
│ │
│ 569 │ │ self.inv_freq = inv_freq.to(config.torch_dtype) │
│ 570 │ │
│ 571 │ def forward(self, x: torch.Tensor, x_pos: torch.Tensor): │
│ ❱ 572 │ │ inv_freq = self.inv_freq.to(device=x.device, dtype=self.dtype) │
│ 573 │ │ │
│ 574 │ │ x_pos = x_pos * self.distance_scale │
│ 575 │ │ freqs = x_pos[..., None].to(self.dtype) * inv_freq[None, :] # (..., dim/2) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: device-side assert triggered
Error type 2 - Using the code in this huggingface repo with setting trust_remote_code=True
modeling_cpmbee.py:787 in forward
│ │
│ 784 │ │ │ │ + segment_rel_offset[:, :, None], │
│ 785 │ │ │ │ ~( │
│ 786 │ │ │ │ │ (sample_ids[:, :, None] == sample_ids[:, None, :]) │
│ ❱ 787 │ │ │ │ │ & (span[:, None, :] == span[:, :, None]) │
│ 788 │ │ │ │ ), # not in the same span or sample │
│ 789 │ │ │ │ 0, # avoid torch.gather overflow │
│ 790 │ │ │ ).view(batch, seqlen * seqlen) │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: 'NoneType' object is not subscriptable
You should not use the forked github code, please follow the case in model card.
As type 2, please use model.generate(). When you use model.forward(), you should process the data by tokenizer.prepare_for_finetune().