Instructions to use openbmb/MiniCPM-MoE-8x2B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use openbmb/MiniCPM-MoE-8x2B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="openbmb/MiniCPM-MoE-8x2B", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("openbmb/MiniCPM-MoE-8x2B", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use openbmb/MiniCPM-MoE-8x2B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "openbmb/MiniCPM-MoE-8x2B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/MiniCPM-MoE-8x2B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/openbmb/MiniCPM-MoE-8x2B

SGLang

How to use openbmb/MiniCPM-MoE-8x2B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "openbmb/MiniCPM-MoE-8x2B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/MiniCPM-MoE-8x2B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "openbmb/MiniCPM-MoE-8x2B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/MiniCPM-MoE-8x2B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use openbmb/MiniCPM-MoE-8x2B with Docker Model Runner:
```
docker model run hf.co/openbmb/MiniCPM-MoE-8x2B
```

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Introduction

OpenBMB Technical Blog Series

The MiniCPM-MoE-8x2B is a decoder-only transformer-based generative language model.

The MiniCPM-MoE-8x2B adopt a Mixture-of-Experts(MoE) architecture, which has 8 experts per layer and activates 2 of 8 experts for each token.

Usage

This is a model version after instruction tuning but without other rlhf methods. Chat template is automatically applied.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
torch.manual_seed(0)

path = 'openbmb/MiniCPM-MoE-8x2B'
tokenizer = AutoTokenizer.from_pretrained(path)
model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map='cuda', trust_remote_code=True)

responds, history = model.chat(tokenizer, "山东省最高的山是哪座山, 它比黄山高还是矮？差距多少？", temperature=0.8, top_p=0.8)
print(responds)

Note

You can alse inference with vLLM(>=0.4.1), which is compatible with this repo and has a much higher inference throughput.
The precision of model weights in this repo is bfloat16. Manual convertion is needed for other kinds of dtype.
For more details, please refer to our github repo.

Statement

As a language model, MiniCPM-MoE-8x2B generates content by learning from a vast amount of text.
However, it does not possess the ability to comprehend or express personal opinions or value judgments.
Any content generated by MiniCPM-MoE-8x2B does not represent the viewpoints or positions of the model developers.
Therefore, when using content generated by MiniCPM-MoE-8x2B, users should take full responsibility for evaluating and verifying it on their own.

Downloads last month: 2,992

Collection including openbmb/MiniCPM-MoE-8x2B

MiniCPM

Collection

The MiniCPM family of LLMs and VLLMs. • 33 items • Updated 10 days ago • 76