openbmb
/

MiniCPM4-8B

Text Generation

Model card Files Files and versions

BigDong commited on Jun 6, 2025

Commit

dedd77e

·

1 Parent(s): 41c8915

update README.md

Files changed (1) hide show

README.md +28 -1

README.md CHANGED Viewed

@@ -208,8 +208,35 @@ print(response.choices[0].message.content)
 ### Inference with [vLLM](https://github.com/vllm-project/vllm)
 For now, you need to install the latest version of vLLM.
-```bash
 ```
 ## Evaluation Results

 ### Inference with [vLLM](https://github.com/vllm-project/vllm)
 For now, you need to install the latest version of vLLM.
+```
+pip install -U vllm \
+    --pre \
+    --extra-index-url https://wheels.vllm.ai/nightly
+```
+Then you can inference MiniCPM4-8B with vLLM:
+```python
+from transformers import AutoTokenizer
+from vllm import LLM, SamplingParams
+model_name = "openbmb/MiniCPM4-8B"
+prompt = [{"role": "user", "content": "Please recommend 5 tourist attractions in Beijing. "}]
+tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
+input_text = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)
+llm = LLM(
+    model=model_name,
+    trust_remote_code=True,
+    max_num_batched_tokens=32768,
+    dtype="bfloat16",
+    gpu_memory_utilization=0.8,
+)
+sampling_params = SamplingParams(top_p=0.7, temperature=0.7, max_tokens=1024, repetition_penalty=1.02)
+outputs = llm.generate(prompts=input_text, sampling_params=sampling_params)
+print(outputs[0].outputs[0].text)
 ```
 ## Evaluation Results