ehartford commited on
Commit
904cbfd
·
verified ·
1 Parent(s): 2c0a848

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -0
README.md ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ I used AutoAWQ to quantize Kimi K2.
2
+
3
+ Run with this command:
4
+
5
+ ```
6
+ $ docker run -it --rm --gpus all --network=host --shm-size=1024g --entrypoint /bin/sh -v /home/hotaisle/workspace/models:/models -v $HOME/.cache/huggingface:/root/.cache/huggingface vllm/vllm-openai:latest -c "pip install blobfile && python3 -m vllm.entrypoints.openai.api_server --model QuixiAI/Kimi-K2-Base-AWQ --port 8000 --tensor-parallel-size 8 --trust-remote-code --gpu-memory-utilization 0.95 --enable-prefix-caching --enable-chunked-prefill --dtype bfloat16"
7
+ ```
8
+
9
+ It seems due to a bug in vLLM, it cannot run.