INC4AI commited on
Commit
e3310c2
·
verified ·
1 Parent(s): 5959c5a

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -15,7 +15,7 @@ This model is an int4 model with group_size 128 and symmetric quantization of [s
15
 
16
  start a vllm server:
17
  ```bash
18
- CUDA_VISIBLE_DEVICES=2,3,5,6 vllm serve INC4AI/Step-3.5-Flash-int4-AutoRound --dtype half \
19
  --host localhost --port 4321 --served-model-name step3p5-flash --data-parallel-size 4 \
20
  --enable-expert-parallel --disable-cascade-attn --reasoning-parser step3p5 \
21
  --enable-auto-tool-choice --tool-call-parser step3p5 --hf-overrides '{"num_nextn_predict_layers": 1}' \
 
15
 
16
  start a vllm server:
17
  ```bash
18
+ vllm serve INC4AI/Step-3.5-Flash-int4-AutoRound --dtype half \
19
  --host localhost --port 4321 --served-model-name step3p5-flash --data-parallel-size 4 \
20
  --enable-expert-parallel --disable-cascade-attn --reasoning-parser step3p5 \
21
  --enable-auto-tool-choice --tool-call-parser step3p5 --hf-overrides '{"num_nextn_predict_layers": 1}' \