QuixiAI
/

Kimi-K2-Base-AWQ

4-bit precision

Model card Files Files and versions

ehartford commited on Aug 4, 2025

Commit

6cdb648

·

verified ·

1 Parent(s): 457daeb

Update README.md

Files changed (1) hide show

README.md +4 -1

README.md CHANGED Viewed

@@ -6,7 +6,10 @@ I used AutoAWQ to quantize Kimi K2.
 Run with this command:
 ```
-$ docker run -it --rm   --gpus all   --network=host   --shm-size=1024g   --entrypoint /bin/sh   -v /home/hotaisle/workspace/models:/models   -v $HOME/.cache/huggingface:/root/.cache/huggingface   vllm/vllm-openai:latest   -c "pip install blobfile && python3 -m vllm.entrypoints.openai.api_server --model QuixiAI/Kimi-K2-Base-AWQ --port 8000 --tensor-parallel-size 8 --trust-remote-code --gpu-memory-utilization 0.95 --enable-prefix-caching --enable-chunked-prefill --dtype bfloat16"
 ```
 It seems due to a bug in vLLM, it cannot run.

 Run with this command:
 ```
+$ docker run -it --rm   --gpus all \
+--network=host \
+--shm-size=1024g   \
+--entrypoint /bin/sh -v /home/hotaisle/workspace/models:/models -v $HOME/.cache/huggingface:/root/.cache/huggingface   vllm/vllm-openai:latest   -c "pip install blobfile && python3 -m vllm.entrypoints.openai.api_server --model QuixiAI/Kimi-K2-Base-AWQ --port 8000 --tensor-parallel-size 8 --trust-remote-code --gpu-memory-utilization 0.95 --enable-prefix-caching --enable-chunked-prefill --dtype bfloat16"
 ```
 It seems due to a bug in vLLM, it cannot run.