Intel
/

DeepSeek-V3.2-int4-AutoRound

Text Generation

4-bit precision

Model card Files Files and versions

INC4AI commited on Jan 23

Commit

cc5c84e

·

verified ·

1 Parent(s): 72da91f

Update README.md

Files changed (1) hide show

README.md +19 -0

README.md CHANGED Viewed

@@ -41,6 +41,25 @@ output_text = tokenizer.decode(generated_ids[0][inputs.input_ids.shape[1] :])
 print(output_text)
 ```
 ## Generate the Model
 ```bash
 git clone -b ds-v32 https://github.com/intel/auto-round.git

 print(output_text)
 ```
+### VLLM Usage
+```bash
+# Prepare environment
+# https://docs.vllm.ai/projects/recipes/en/latest/DeepSeek/DeepSeek-V3_2.html#launching-deepseek-v32
+pip install git+https://github.com/deepseek-ai/DeepGEMM.git@v2.1.1.post3 --no-build-isolation
+git clone https://github.com/vllm-project/vllm.git
+cd vllm && git checkout 773d7073a
+VLLM_PRECOMPILED_WHEEL_COMMIT=7f42dc20bb2800d09faa72b26f25d54e26f1b694 VLLM_USE_PRECOMPILED=1 pip install --editable .
+# Start server
+VLLM_ALLREDUCE_USE_SYMM_MEM=0 NCCL_NVLS_ENABLE=0 VLLM_USE_FUSED_MOE_GROUPED_TOPK=0 \
+    vllm serve Intel/DeepSeek-V3.2-int4-AutoRound \
+        --tensor-parallel-size 4 \
+        --tokenizer-mode deepseek_v32 \
+        --tool-call-parser deepseek_v32 \
+        --enable-auto-tool-choice \
+        --reasoning-parser deepseek_v3
+```
 ## Generate the Model
 ```bash
 git clone -b ds-v32 https://github.com/intel/auto-round.git