INC4AI commited on
Commit
5ebb506
·
verified ·
1 Parent(s): e3310c2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -15,11 +15,10 @@ This model is an int4 model with group_size 128 and symmetric quantization of [s
15
 
16
  start a vllm server:
17
  ```bash
18
- vllm serve INC4AI/Step-3.5-Flash-int4-AutoRound --dtype half \
19
  --host localhost --port 4321 --served-model-name step3p5-flash --data-parallel-size 4 \
20
  --enable-expert-parallel --disable-cascade-attn --reasoning-parser step3p5 \
21
- --enable-auto-tool-choice --tool-call-parser step3p5 --hf-overrides '{"num_nextn_predict_layers": 1}' \
22
- --trust-remote-code
23
  ```
24
 
25
  benchmark test:
 
15
 
16
  start a vllm server:
17
  ```bash
18
+ vllm serve INC4AI/Step-3.5-Flash-int4-AutoRound --dtype half --trust-remote-code \
19
  --host localhost --port 4321 --served-model-name step3p5-flash --data-parallel-size 4 \
20
  --enable-expert-parallel --disable-cascade-attn --reasoning-parser step3p5 \
21
+ --enable-auto-tool-choice --tool-call-parser step3p5 --hf-overrides '{"num_nextn_predict_layers": 1}'
 
22
  ```
23
 
24
  benchmark test: