ehartford commited on
Commit
f515c83
·
verified ·
1 Parent(s): 6cdb648

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -5
README.md CHANGED
@@ -6,10 +6,14 @@ I used AutoAWQ to quantize Kimi K2.
6
  Run with this command:
7
 
8
  ```
9
- $ docker run -it --rm --gpus all \
10
- --network=host \
11
- --shm-size=1024g \
12
- --entrypoint /bin/sh -v /home/hotaisle/workspace/models:/models -v $HOME/.cache/huggingface:/root/.cache/huggingface vllm/vllm-openai:latest -c "pip install blobfile && python3 -m vllm.entrypoints.openai.api_server --model QuixiAI/Kimi-K2-Base-AWQ --port 8000 --tensor-parallel-size 8 --trust-remote-code --gpu-memory-utilization 0.95 --enable-prefix-caching --enable-chunked-prefill --dtype bfloat16"
13
- ```
 
 
 
 
14
 
15
  It seems due to a bug in vLLM, it cannot run.
 
6
  Run with this command:
7
 
8
  ```
9
+ $ docker run -it --rm \
10
+ --gpus all \
11
+ --network=host \
12
+ --shm-size=1024g \
13
+ --entrypoint /bin/sh \
14
+ -v /home/hotaisle/workspace/models:/models \
15
+ -v $HOME/.cache/huggingface:/root/.cache/huggingface \
16
+ vllm/vllm-openai:latest \
17
+ -c "pip install blobfile && python3 -m vllm.entrypoints.openai.api_server --model QuixiAI/Kimi-K2-Base-AWQ --port 8000 --tensor-parallel-size 8 --trust-remote-code --gpu-memory-utilization 0.95 --enable-prefix-caching --enable-chunked-prefill --dtype bfloat16"```
18
 
19
  It seems due to a bug in vLLM, it cannot run.