INC4AI
/

Step-3.5-Flash-int4-AutoRound

Text Generation

4-bit precision

Model card Files Files and versions

INC4AI commited on 4 days ago

Commit

5ebb506

·

verified ·

1 Parent(s): e3310c2

Update README.md

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -15,11 +15,10 @@ This model is an int4 model with group_size 128 and symmetric quantization of [s
 start a vllm server:
 ```bash
-vllm serve INC4AI/Step-3.5-Flash-int4-AutoRound  --dtype half \
 --host localhost  --port 4321  --served-model-name step3p5-flash --data-parallel-size 4 \
 --enable-expert-parallel   --disable-cascade-attn   --reasoning-parser step3p5   \
---enable-auto-tool-choice   --tool-call-parser step3p5   --hf-overrides '{"num_nextn_predict_layers": 1}' \
---trust-remote-code
 ```
 benchmark test:

 start a vllm server:
 ```bash
+vllm serve INC4AI/Step-3.5-Flash-int4-AutoRound  --dtype half --trust-remote-code \
 --host localhost  --port 4321  --served-model-name step3p5-flash --data-parallel-size 4 \
 --enable-expert-parallel   --disable-cascade-attn   --reasoning-parser step3p5   \
+--enable-auto-tool-choice   --tool-call-parser step3p5   --hf-overrides '{"num_nextn_predict_layers": 1}'
 ```
 benchmark test: