Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,9 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
I used AutoAWQ to quantize Kimi K2.
|
| 2 |
+
|
| 3 |
+
Run with this command:
|
| 4 |
+
|
| 5 |
+
```
|
| 6 |
+
$ docker run -it --rm --gpus all --network=host --shm-size=1024g --entrypoint /bin/sh -v /home/hotaisle/workspace/models:/models -v $HOME/.cache/huggingface:/root/.cache/huggingface vllm/vllm-openai:latest -c "pip install blobfile && python3 -m vllm.entrypoints.openai.api_server --model QuixiAI/Kimi-K2-Base-AWQ --port 8000 --tensor-parallel-size 8 --trust-remote-code --gpu-memory-utilization 0.95 --enable-prefix-caching --enable-chunked-prefill --dtype bfloat16"
|
| 7 |
+
```
|
| 8 |
+
|
| 9 |
+
It seems due to a bug in vLLM, it cannot run.
|