Update README.md
Browse files
README.md
CHANGED
|
@@ -22,8 +22,8 @@ pip download fastllm/Kimi-K2-Instruct-INT4MIX
|
|
| 22 |
|
| 23 |
``` sh
|
| 24 |
# 假设模型下载在 /root/Kimi-K2-Instruct-INT4MIX
|
| 25 |
-
|
| 26 |
-
|
| 27 |
```
|
| 28 |
|
| 29 |
# 优化
|
|
@@ -36,7 +36,7 @@ pip server /root/Kimi-K2-Instruct-INT4MIX # API 服务器模式(默认模型
|
|
| 36 |
例如:
|
| 37 |
|
| 38 |
``` sh
|
| 39 |
-
|
| 40 |
```
|
| 41 |
|
| 42 |
## 多 CPU(多 NUMA 节点)
|
|
@@ -77,8 +77,8 @@ pip download fastllm/Kimi-K2-Instruct-INT4MIX
|
|
| 77 |
|
| 78 |
``` sh
|
| 79 |
# Assuming the model is downloaded in /root/Kimi-K2-Instruct-INT4MIX
|
| 80 |
-
|
| 81 |
-
|
| 82 |
```
|
| 83 |
|
| 84 |
# optimize
|
|
@@ -91,7 +91,7 @@ If the speed is extremely slow, it may be due to too many threads—consider red
|
|
| 91 |
for example:
|
| 92 |
|
| 93 |
``` sh
|
| 94 |
-
|
| 95 |
```
|
| 96 |
|
| 97 |
## multi cpu (multi numa node)
|
|
|
|
| 22 |
|
| 23 |
``` sh
|
| 24 |
# 假设模型下载在 /root/Kimi-K2-Instruct-INT4MIX
|
| 25 |
+
ftllm run /root/Kimi-K2-Instruct-INT4MIX # 聊天模式
|
| 26 |
+
ftllm server /root/Kimi-K2-Instruct-INT4MIX # API 服务器模式(默认模型名称 = /root/Kimi-K2-Instruct-INT4MIX,端口 = 8080)
|
| 27 |
```
|
| 28 |
|
| 29 |
# 优化
|
|
|
|
| 36 |
例如:
|
| 37 |
|
| 38 |
``` sh
|
| 39 |
+
ftllm server /root/Kimi-K2-Instruct-INT4MIX -t 12
|
| 40 |
```
|
| 41 |
|
| 42 |
## 多 CPU(多 NUMA 节点)
|
|
|
|
| 77 |
|
| 78 |
``` sh
|
| 79 |
# Assuming the model is downloaded in /root/Kimi-K2-Instruct-INT4MIX
|
| 80 |
+
ftllm run /root/Kimi-K2-Instruct-INT4MIX # chat
|
| 81 |
+
ftllm server /root/Kimi-K2-Instruct-INT4MIX # api server (default model_name = /root/Kimi-K2-Instruct-INT4MIX, port = 8080)
|
| 82 |
```
|
| 83 |
|
| 84 |
# optimize
|
|
|
|
| 91 |
for example:
|
| 92 |
|
| 93 |
``` sh
|
| 94 |
+
ftllm server /root/Kimi-K2-Instruct-INT4MIX -t 12
|
| 95 |
```
|
| 96 |
|
| 97 |
## multi cpu (multi numa node)
|