Add config_format and load_format to vLLM args

#5
by mgoin - opened
Files changed (1) hide show
  1. README.md +2 -2
README.md CHANGED
@@ -338,7 +338,7 @@ We recommend to use Pixtral-Large-Instruct-2411 in a server/client setting.
338
  1. Spin up a server:
339
 
340
  ```
341
- vllm serve mistralai/Pixtral-Large-Instruct-2411 --tokenizer_mode mistral --limit_mm_per_prompt 'image=10' --tensor-parallel-size 8
342
  ```
343
 
344
  2. And ping the client:
@@ -523,7 +523,7 @@ messages = [
523
  sampling_params = SamplingParams(max_tokens=512)
524
 
525
  # note that running this model on GPU requires over 300 GB of GPU RAM
526
- llm = LLM(model=model_name, tokenizer_mode="mistral", tensor_parallel_size=8, limit_mm_per_prompt={"image": 4})
527
 
528
  outputs = llm.chat(messages, sampling_params=sampling_params)
529
 
 
338
  1. Spin up a server:
339
 
340
  ```
341
+ vllm serve mistralai/Pixtral-Large-Instruct-2411 --config-format mistral --load-format mistral --tokenizer_mode mistral --limit_mm_per_prompt 'image=10' --tensor-parallel-size 8
342
  ```
343
 
344
  2. And ping the client:
 
523
  sampling_params = SamplingParams(max_tokens=512)
524
 
525
  # note that running this model on GPU requires over 300 GB of GPU RAM
526
+ llm = LLM(model=model_name, config_format="mistral", load_format="mistral", tokenizer_mode="mistral", tensor_parallel_size=8, limit_mm_per_prompt={"image": 4})
527
 
528
  outputs = llm.chat(messages, sampling_params=sampling_params)
529