Update README.md
Browse files
README.md
CHANGED
|
@@ -6,10 +6,14 @@ I used AutoAWQ to quantize Kimi K2.
|
|
| 6 |
Run with this command:
|
| 7 |
|
| 8 |
```
|
| 9 |
-
$ docker run -it --rm
|
| 10 |
-
--
|
| 11 |
-
--
|
| 12 |
-
--
|
| 13 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 14 |
|
| 15 |
It seems due to a bug in vLLM, it cannot run.
|
|
|
|
| 6 |
Run with this command:
|
| 7 |
|
| 8 |
```
|
| 9 |
+
$ docker run -it --rm \
|
| 10 |
+
--gpus all \
|
| 11 |
+
--network=host \
|
| 12 |
+
--shm-size=1024g \
|
| 13 |
+
--entrypoint /bin/sh \
|
| 14 |
+
-v /home/hotaisle/workspace/models:/models \
|
| 15 |
+
-v $HOME/.cache/huggingface:/root/.cache/huggingface \
|
| 16 |
+
vllm/vllm-openai:latest \
|
| 17 |
+
-c "pip install blobfile && python3 -m vllm.entrypoints.openai.api_server --model QuixiAI/Kimi-K2-Base-AWQ --port 8000 --tensor-parallel-size 8 --trust-remote-code --gpu-memory-utilization 0.95 --enable-prefix-caching --enable-chunked-prefill --dtype bfloat16"```
|
| 18 |
|
| 19 |
It seems due to a bug in vLLM, it cannot run.
|