Instructions to use openbmb/MiniCPM3-4B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use openbmb/MiniCPM3-4B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="openbmb/MiniCPM3-4B", trust_remote_code=True)
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained("openbmb/MiniCPM3-4B", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use openbmb/MiniCPM3-4B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "openbmb/MiniCPM3-4B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/MiniCPM3-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/openbmb/MiniCPM3-4B

SGLang

How to use openbmb/MiniCPM3-4B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "openbmb/MiniCPM3-4B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/MiniCPM3-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "openbmb/MiniCPM3-4B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "openbmb/MiniCPM3-4B",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use openbmb/MiniCPM3-4B with Docker Model Runner:
```
docker model run hf.co/openbmb/MiniCPM3-4B
```

MiniCPM3-4B / README.md

harveydr

Update README.md

3263597 verified 10 months ago

preview code

raw

history blame

8.69 kB

	---
	license: apache-2.0
	language:
	- zh
	pipeline_tag: text-generation
	library_name: transformers
	tags:
	- medical
	---
	<div align="center">
	<img src="https://github.com/OpenBMB/MiniCPM/blob/main/assets/minicpm_logo.png?raw=true" width="500em" ></img>
	</div>

	<p align="center">
	<a href="https://github.com/OpenBMB/MiniCPM/" target="_blank">MiniCPM Repo</a> \|
	<a href="https://arxiv.org/abs/2404.06395" target="_blank">MiniCPM Paper</a> \|
	<a href="https://github.com/OpenBMB/MiniCPM-V/" target="_blank">MiniCPM-V Repo</a> \|
	Join us in <a href="https://discord.gg/3cGQn9b3YM" target="_blank">Discord</a> and <a href="https://github.com/OpenBMB/MiniCPM/blob/main/assets/wechat.jpg" target="_blank">WeChat</a>

	</p>

	## Introduction
	MiniCPM3-4B is the 3rd generation of MiniCPM series. The overall performance of MiniCPM3-4B surpasses Phi-3.5-mini-Instruct and GPT-3.5-Turbo-0125, being comparable with many recent 7B~9B models.

	Compared to MiniCPM1.0/MiniCPM2.0, MiniCPM3-4B has a more powerful and versatile skill set to enable more general usage. MiniCPM3-4B supports function call, along with code interpreter. Please refer to [Advanced Features](https://github.com/OpenBMB/MiniCPM/tree/main?tab=readme-ov-file#%E8%BF%9B%E9%98%B6%E5%8A%9F%E8%83%BD) for usage guidelines.

	MiniCPM3-4B has a 32k context window. Equipped with LLMxMapReduce, MiniCPM3-4B can handle infinite context theoretically, without requiring huge amount of memory.

	## Usage
	### Inference with Transformers
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	import torch

	path = "openbmb/MiniCPM3-4B"
	device = "cuda"

	tokenizer = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
	model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map=device, trust_remote_code=True)

	messages = [
	{"role": "user", "content": "推荐5个北京的景点。"},
	]
	model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(device)

	model_outputs = model.generate(
	model_inputs,
	max_new_tokens=1024,
	top_p=0.7,
	temperature=0.7
	)

	output_token_ids = [
	model_outputs[i][len(model_inputs[i]):] for i in range(len(model_inputs))
	]
	这种
	responses = tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0]
	print(responses)
	```

	### Inference with [vLLM](https://github.com/vllm-project/vllm)

	For now, you need to install our forked version of vLLM.

	```bash
	pip install git+https://github.com/OpenBMB/vllm.git@minicpm3
	```

	```python
	from transformers import AutoTokenizer
	from vllm import LLM, SamplingParams

	model_name = "openbmb/MiniCPM3-4B"
	prompt = [{"role": "user", "content": "推荐5个北京的景点。"}]

	tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
	input_text = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)

	llm = LLM(
	model=model_name,
	trust_remote_code=True,
	tensor_parallel_size=1
	)
	sampling_params = SamplingParams(top_p=0.7, temperature=0.7, max_tokens=1024, repetition_penalty=1.02)

	outputs = llm.generate(prompts=input_text, sampling_params=sampling_params)

	print(outputs[0].outputs[0].text)
	```

	## Evaluation Results

	<table>
	<tr>
	<td>Benchmark</td>
	<td>Qwen2-7B-Instruct</td>
	<td>GLM-4-9B-Chat</td>
	<td>Gemma2-9B-it</td>
	<td>Llama3.1-8B-Instruct</td>
	<td>GPT-3.5-Turbo-0125</td>
	<td>Phi-3.5-mini-Instruct(3.8B)</td>
	<td>MiniCPM3-4B </td>
	</tr>
	<tr>
	<td colspan="15" align="left"><strong>English</strong></td>
	</tr>
	<tr>
	<td>MMLU</td>
	<td>70.5</td>
	<td>72.4</td>
	<td>72.6</td>
	<td>69.4</td>
	<td>69.2</td>
	<td>68.4</td>
	<td>67.2 </td>
	</tr>
	<tr>
	<td>BBH</td>
	<td>64.9</td>
	<td>76.3</td>
	<td>65.2</td>
	<td>67.8</td>
	<td>70.3</td>
	<td>68.6</td>
	<td>70.2 </td>
	</tr>
	<tr>
	<td>MT-Bench</td>
	<td>8.41</td>
	<td>8.35</td>
	<td>7.88</td>
	<td>8.28</td>
	<td>8.17</td>
	<td>8.60</td>
	<td>8.41 </td>
	</tr>
	<tr>
	<td>IFEVAL (Prompt Strict-Acc.)</td>
	<td>51.0</td>
	<td>64.5</td>
	<td>71.9</td>
	<td>71.5</td>
	<td>58.8</td>
	<td>49.4</td>
	<td>68.4 </td>
	</tr>
	<tr>
	<td colspan="15" align="left"><strong>Chinese</strong></td>
	</tr>
	<tr>
	<td>CMMLU</td>
	<td>80.9</td>
	<td>71.5</td>
	<td>59.5</td>
	<td>55.8</td>
	<td>54.5</td>
	<td>46.9</td>
	<td>73.3 </td>
	</tr>
	<tr>
	<td>CEVAL</td>
	<td>77.2</td>
	<td>75.6</td>
	<td>56.7</td>
	<td>55.2</td>
	<td>52.8</td>
	<td>46.1</td>
	<td>73.6 </td>
	</tr>
	<tr>
	<td>AlignBench v1.1</td>
	<td>7.10</td>
	<td>6.61</td>
	<td>7.10</td>
	<td>5.68</td>
	<td>5.82</td>
	<td>5.73</td>
	<td>6.74 </td>
	</tr>
	<tr>
	<td>FollowBench-zh (SSR)</td>
	<td>63.0</td>
	<td>56.4</td>
	<td>57.0</td>
	<td>50.6</td>
	<td>64.6</td>
	<td>58.1</td>
	<td>66.8 </td>
	</tr>
	<tr>
	<td colspan="15" align="left"><strong>Math</strong></td>
	</tr>
	<tr>
	<td>MATH</td>
	<td>49.6</td>
	<td>50.6</td>
	<td>46.0</td>
	<td>51.9</td>
	<td>41.8</td>
	<td>46.4</td>
	<td>46.6 </td>
	</tr>
	<tr>
	<td>GSM8K</td>
	<td>82.3</td>
	<td>79.6</td>
	<td>79.7</td>
	<td>84.5</td>
	<td>76.4</td>
	<td>82.7</td>
	<td>81.1 </td>
	</tr>
	<tr>
	<td>MathBench</td>
	<td>63.4</td>
	<td>59.4</td>
	<td>45.8</td>
	<td>54.3</td>
	<td>48.9</td>
	<td>54.9</td>
	<td>65.6 </td>
	</tr>
	<tr>
	<td colspan="15" align="left"><strong>Code</strong></td>
	</tr>
	<tr>
	<td>HumanEval+</td>
	<td>70.1</td>
	<td>67.1</td>
	<td>61.6</td>
	<td>62.8</td>
	<td>66.5</td>
	<td>68.9</td>
	<td>68.3 </td>
	</tr>
	<tr>
	<td>MBPP+</td>
	<td>57.1</td>
	<td>62.2</td>
	<td>64.3</td>
	<td>55.3</td>
	<td>71.4</td>
	<td>55.8</td>
	<td>63.2 </td>
	</tr>
	<tr>
	<td>LiveCodeBench v3</td>
	<td>22.2</td>
	<td>20.2</td>
	<td>19.2</td>
	<td>20.4</td>
	<td>24.0</td>
	<td>19.6</td>
	<td>22.6 </td>
	</tr>
	<tr>
	<td colspan="15" align="left"><strong>Function Call</strong></td>
	</tr>
	<tr>
	<td>BFCL v2</td>
	<td>71.6</td>
	<td>70.1</td>
	<td>19.2</td>
	<td>73.3</td>
	<td>75.4</td>
	<td>48.4</td>
	<td>76.0 </td>
	</tr>
	<tr>
	<td colspan="15" align="left"><strong>Overall</strong></td>
	</tr>
	<tr>
	<td>Average</td>
	<td>65.3</td>
	<td>65.0</td>
	<td>57.9</td>
	<td>60.8</td>
	<td>61.0</td>
	<td>57.2</td>
	<td><strong>66.3</strong></td>
	</tr>
	</table>


	## Statement
	* As a language model, MiniCPM3-4B generates content by learning from a vast amount of text.
	* However, it does not possess the ability to comprehend or express personal opinions or value judgments.
	* Any content generated by MiniCPM3-4B does not represent the viewpoints or positions of the model developers.
	* Therefore, when using content generated by MiniCPM3-4B, users should take full responsibility for evaluating and verifying it on their own.

	## LICENSE
	* This repository is released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License.
	* The usage of MiniCPM3-4B model weights must strictly follow [MiniCPM Model License.md](https://github.com/OpenBMB/MiniCPM/blob/main/MiniCPM%20Model%20License.md).
	* The models and weights of MiniCPM3-4B are completely free for academic research. after filling out a ["questionnaire"](https://modelbest.feishu.cn/share/base/form/shrcnpV5ZT9EJ6xYjh3Kx0J6v8g) for registration, are also available for free commercial use.

	## Citation

	```
	@article{hu2024minicpm,
	title={MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies},
	author={Hu, Shengding and Tu, Yuge and Han, Xu and He, Chaoqun and Cui, Ganqu and Long, Xiang and Zheng, Zhi and Fang, Yewei and Huang, Yuxiang and Zhao, Weilin and others},
	journal={arXiv preprint arXiv:2404.06395},
	year={2024}
	}
	```