ehartford commited on
Commit
6cdb648
·
verified ·
1 Parent(s): 457daeb

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -6,7 +6,10 @@ I used AutoAWQ to quantize Kimi K2.
6
  Run with this command:
7
 
8
  ```
9
- $ docker run -it --rm --gpus all --network=host --shm-size=1024g --entrypoint /bin/sh -v /home/hotaisle/workspace/models:/models -v $HOME/.cache/huggingface:/root/.cache/huggingface vllm/vllm-openai:latest -c "pip install blobfile && python3 -m vllm.entrypoints.openai.api_server --model QuixiAI/Kimi-K2-Base-AWQ --port 8000 --tensor-parallel-size 8 --trust-remote-code --gpu-memory-utilization 0.95 --enable-prefix-caching --enable-chunked-prefill --dtype bfloat16"
 
 
 
10
  ```
11
 
12
  It seems due to a bug in vLLM, it cannot run.
 
6
  Run with this command:
7
 
8
  ```
9
+ $ docker run -it --rm --gpus all \
10
+ --network=host \
11
+ --shm-size=1024g \
12
+ --entrypoint /bin/sh -v /home/hotaisle/workspace/models:/models -v $HOME/.cache/huggingface:/root/.cache/huggingface vllm/vllm-openai:latest -c "pip install blobfile && python3 -m vllm.entrypoints.openai.api_server --model QuixiAI/Kimi-K2-Base-AWQ --port 8000 --tensor-parallel-size 8 --trust-remote-code --gpu-memory-utilization 0.95 --enable-prefix-caching --enable-chunked-prefill --dtype bfloat16"
13
  ```
14
 
15
  It seems due to a bug in vLLM, it cannot run.