update README.md
Browse files
README.md
CHANGED
|
@@ -208,8 +208,35 @@ print(response.choices[0].message.content)
|
|
| 208 |
|
| 209 |
### Inference with [vLLM](https://github.com/vllm-project/vllm)
|
| 210 |
For now, you need to install the latest version of vLLM.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 211 |
|
| 212 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 213 |
```
|
| 214 |
|
| 215 |
## Evaluation Results
|
|
|
|
| 208 |
|
| 209 |
### Inference with [vLLM](https://github.com/vllm-project/vllm)
|
| 210 |
For now, you need to install the latest version of vLLM.
|
| 211 |
+
```
|
| 212 |
+
pip install -U vllm \
|
| 213 |
+
--pre \
|
| 214 |
+
--extra-index-url https://wheels.vllm.ai/nightly
|
| 215 |
+
```
|
| 216 |
|
| 217 |
+
Then you can inference MiniCPM4-8B with vLLM:
|
| 218 |
+
```python
|
| 219 |
+
from transformers import AutoTokenizer
|
| 220 |
+
from vllm import LLM, SamplingParams
|
| 221 |
+
|
| 222 |
+
model_name = "openbmb/MiniCPM4-8B"
|
| 223 |
+
prompt = [{"role": "user", "content": "Please recommend 5 tourist attractions in Beijing. "}]
|
| 224 |
+
|
| 225 |
+
tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
|
| 226 |
+
input_text = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)
|
| 227 |
+
|
| 228 |
+
llm = LLM(
|
| 229 |
+
model=model_name,
|
| 230 |
+
trust_remote_code=True,
|
| 231 |
+
max_num_batched_tokens=32768,
|
| 232 |
+
dtype="bfloat16",
|
| 233 |
+
gpu_memory_utilization=0.8,
|
| 234 |
+
)
|
| 235 |
+
sampling_params = SamplingParams(top_p=0.7, temperature=0.7, max_tokens=1024, repetition_penalty=1.02)
|
| 236 |
+
|
| 237 |
+
outputs = llm.generate(prompts=input_text, sampling_params=sampling_params)
|
| 238 |
+
|
| 239 |
+
print(outputs[0].outputs[0].text)
|
| 240 |
```
|
| 241 |
|
| 242 |
## Evaluation Results
|