Clarify supported context length in docs (#21)
Browse files- update: clearify supported context length in docs (7d718af0cb9fc1c12440bc9912e5abcb42067873)
- update (67e71adcb7d24eadc6f88a02c6665ef3634fe197)
docs/sglang_deploy_guide.md
CHANGED
|
@@ -27,9 +27,11 @@ The deployment process is illustrated below using MiniMax-M2.1 as an example.
|
|
| 27 |
|
| 28 |
The following are recommended configurations; actual requirements should be adjusted based on your use case:
|
| 29 |
|
| 30 |
-
-
|
| 31 |
|
| 32 |
-
-
|
|
|
|
|
|
|
| 33 |
|
| 34 |
## Deployment with Python
|
| 35 |
|
|
|
|
| 27 |
|
| 28 |
The following are recommended configurations; actual requirements should be adjusted based on your use case:
|
| 29 |
|
| 30 |
+
- **96G x4** GPU: Supports a total KV Cache capacity of 400K tokens.
|
| 31 |
|
| 32 |
+
- **144G x8** GPU: Supports a total KV Cache capacity of up to 3M tokens.
|
| 33 |
+
|
| 34 |
+
> **Note**: The values above represent the total aggregate hardware KV Cache capacity. The maximum context length per individual sequence remains **196K** tokens.
|
| 35 |
|
| 36 |
## Deployment with Python
|
| 37 |
|
docs/sglang_deploy_guide_cn.md
CHANGED
|
@@ -27,9 +27,11 @@
|
|
| 27 |
|
| 28 |
以下为推荐配置,实际需求请根据业务场景调整:
|
| 29 |
|
| 30 |
-
- 96G x4 GPU
|
| 31 |
|
| 32 |
-
- 144G x8 GPU
|
|
|
|
|
|
|
| 33 |
|
| 34 |
## 使用 Python 部署
|
| 35 |
|
|
|
|
| 27 |
|
| 28 |
以下为推荐配置,实际需求请根据业务场景调整:
|
| 29 |
|
| 30 |
+
- **96G x4 GPU**:总 KV Cache 容量支持 40 万 token。
|
| 31 |
|
| 32 |
+
- **144G x8 GPU**:总 KV Cache 容量支持高达 300 万 token。
|
| 33 |
+
|
| 34 |
+
> **注**:以上数值为硬件支持的最大并发缓存总量,模型单序列(Single Sequence)长度上限仍为 196k。
|
| 35 |
|
| 36 |
## 使用 Python 部署
|
| 37 |
|
docs/vllm_deploy_guide.md
CHANGED
|
@@ -26,9 +26,11 @@ The deployment process is illustrated below using MiniMax-M2.1 as an example.
|
|
| 26 |
|
| 27 |
The following are recommended configurations; actual requirements should be adjusted based on your use case:
|
| 28 |
|
| 29 |
-
-
|
| 30 |
|
| 31 |
-
-
|
|
|
|
|
|
|
| 32 |
|
| 33 |
## Deployment with Python
|
| 34 |
|
|
|
|
| 26 |
|
| 27 |
The following are recommended configurations; actual requirements should be adjusted based on your use case:
|
| 28 |
|
| 29 |
+
- **96G x4** GPU: Supports a total KV Cache capacity of 400K tokens.
|
| 30 |
|
| 31 |
+
- **144G x8** GPU: Supports a total KV Cache capacity of up to 3M tokens.
|
| 32 |
+
|
| 33 |
+
> **Note**: The values above represent the total aggregate hardware KV Cache capacity. The maximum context length per individual sequence remains **196K** tokens.
|
| 34 |
|
| 35 |
## Deployment with Python
|
| 36 |
|
docs/vllm_deploy_guide_cn.md
CHANGED
|
@@ -26,9 +26,11 @@
|
|
| 26 |
|
| 27 |
以下为推荐配置,实际需求请根据业务场景调整:
|
| 28 |
|
| 29 |
-
- 96G x4 GPU
|
| 30 |
|
| 31 |
-
- 144G x8 GPU
|
|
|
|
|
|
|
| 32 |
|
| 33 |
## 使用 Python 部署
|
| 34 |
|
|
|
|
| 26 |
|
| 27 |
以下为推荐配置,实际需求请根据业务场景调整:
|
| 28 |
|
| 29 |
+
- **96G x4 GPU**:总 KV Cache 容量支持 40 万 token。
|
| 30 |
|
| 31 |
+
- **144G x8 GPU**:总 KV Cache 容量支持高达 300 万 token。
|
| 32 |
+
|
| 33 |
+
> **注**:以上数值为硬件支持的最大并发缓存总量,模型单序列(Single Sequence)长度上限仍为 196k。
|
| 34 |
|
| 35 |
## 使用 Python 部署
|
| 36 |
|