BigDong commited on
Commit
dedd77e
·
1 Parent(s): 41c8915

update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -1
README.md CHANGED
@@ -208,8 +208,35 @@ print(response.choices[0].message.content)
208
 
209
  ### Inference with [vLLM](https://github.com/vllm-project/vllm)
210
  For now, you need to install the latest version of vLLM.
 
 
 
 
 
211
 
212
- ```bash
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
213
  ```
214
 
215
  ## Evaluation Results
 
208
 
209
  ### Inference with [vLLM](https://github.com/vllm-project/vllm)
210
  For now, you need to install the latest version of vLLM.
211
+ ```
212
+ pip install -U vllm \
213
+ --pre \
214
+ --extra-index-url https://wheels.vllm.ai/nightly
215
+ ```
216
 
217
+ Then you can inference MiniCPM4-8B with vLLM:
218
+ ```python
219
+ from transformers import AutoTokenizer
220
+ from vllm import LLM, SamplingParams
221
+
222
+ model_name = "openbmb/MiniCPM4-8B"
223
+ prompt = [{"role": "user", "content": "Please recommend 5 tourist attractions in Beijing. "}]
224
+
225
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
226
+ input_text = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)
227
+
228
+ llm = LLM(
229
+ model=model_name,
230
+ trust_remote_code=True,
231
+ max_num_batched_tokens=32768,
232
+ dtype="bfloat16",
233
+ gpu_memory_utilization=0.8,
234
+ )
235
+ sampling_params = SamplingParams(top_p=0.7, temperature=0.7, max_tokens=1024, repetition_penalty=1.02)
236
+
237
+ outputs = llm.generate(prompts=input_text, sampling_params=sampling_params)
238
+
239
+ print(outputs[0].outputs[0].text)
240
  ```
241
 
242
  ## Evaluation Results