INC4AI
/

Step-3.5-Flash-int4-AutoRound

Text Generation

4-bit precision

Model card Files Files and versions

INC4AI commited on 4 days ago

Commit

e3310c2

·

verified ·

1 Parent(s): 5959c5a

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -15,7 +15,7 @@ This model is an int4 model with group_size 128 and symmetric quantization of [s
 start a vllm server:
 ```bash
-CUDA_VISIBLE_DEVICES=2,3,5,6 vllm serve INC4AI/Step-3.5-Flash-int4-AutoRound  --dtype half \
 --host localhost  --port 4321  --served-model-name step3p5-flash --data-parallel-size 4 \
 --enable-expert-parallel   --disable-cascade-attn   --reasoning-parser step3p5   \
 --enable-auto-tool-choice   --tool-call-parser step3p5   --hf-overrides '{"num_nextn_predict_layers": 1}' \

 start a vllm server:
 ```bash
+vllm serve INC4AI/Step-3.5-Flash-int4-AutoRound  --dtype half \
 --host localhost  --port 4321  --served-model-name step3p5-flash --data-parallel-size 4 \
 --enable-expert-parallel   --disable-cascade-attn   --reasoning-parser step3p5   \
 --enable-auto-tool-choice   --tool-call-parser step3p5   --hf-overrides '{"num_nextn_predict_layers": 1}' \