xuebi
commited on
Commit
·
7d718af
1
Parent(s):
927ea2b
update: clearify supported context length in docs
Browse files
docs/sglang_deploy_guide.md
CHANGED
|
@@ -27,9 +27,9 @@ The deployment process is illustrated below using MiniMax-M2.1 as an example.
|
|
| 27 |
|
| 28 |
The following are recommended configurations; actual requirements should be adjusted based on your use case:
|
| 29 |
|
| 30 |
-
- 4x 96GB GPUs:
|
| 31 |
|
| 32 |
-
- 8x 144GB GPUs:
|
| 33 |
|
| 34 |
## Deployment with Python
|
| 35 |
|
|
|
|
| 27 |
|
| 28 |
The following are recommended configurations; actual requirements should be adjusted based on your use case:
|
| 29 |
|
| 30 |
+
- 4x 96GB GPUs: Supports 400K aggregate KV cache tokens. (Max 196K per sequence)
|
| 31 |
|
| 32 |
+
- 8x 144GB GPUs: Supports 3M aggregate KV cache tokens. (Max 196K per sequence)
|
| 33 |
|
| 34 |
## Deployment with Python
|
| 35 |
|
docs/sglang_deploy_guide_cn.md
CHANGED
|
@@ -27,9 +27,9 @@
|
|
| 27 |
|
| 28 |
以下为推荐配置,实际需求请根据业务场景调整:
|
| 29 |
|
| 30 |
-
- 96G x4 GPU:支持 40 万 token
|
| 31 |
|
| 32 |
-
- 144G x8 GPU:支持长达 300 万 token
|
| 33 |
|
| 34 |
## 使用 Python 部署
|
| 35 |
|
|
|
|
| 27 |
|
| 28 |
以下为推荐配置,实际需求请根据业务场景调整:
|
| 29 |
|
| 30 |
+
- 96G x4 GPU:支持 40 万 token 的总上下文(每个序列最多 19.6 万个)。
|
| 31 |
|
| 32 |
+
- 144G x8 GPU:支持长达 300 万 token 的总上下文(每个序列最多 19.6 万个)。
|
| 33 |
|
| 34 |
## 使用 Python 部署
|
| 35 |
|
docs/vllm_deploy_guide.md
CHANGED
|
@@ -26,9 +26,9 @@ The deployment process is illustrated below using MiniMax-M2.1 as an example.
|
|
| 26 |
|
| 27 |
The following are recommended configurations; actual requirements should be adjusted based on your use case:
|
| 28 |
|
| 29 |
-
- 4x 96GB GPUs:
|
| 30 |
|
| 31 |
-
- 8x 144GB GPUs:
|
| 32 |
|
| 33 |
## Deployment with Python
|
| 34 |
|
|
|
|
| 26 |
|
| 27 |
The following are recommended configurations; actual requirements should be adjusted based on your use case:
|
| 28 |
|
| 29 |
+
- 4x 96GB GPUs: Supports 400K aggregate KV cache tokens. (Max 196K per sequence)
|
| 30 |
|
| 31 |
+
- 8x 144GB GPUs: Supports 3M aggregate KV cache tokens. (Max 196K per sequence)
|
| 32 |
|
| 33 |
## Deployment with Python
|
| 34 |
|
docs/vllm_deploy_guide_cn.md
CHANGED
|
@@ -26,9 +26,9 @@
|
|
| 26 |
|
| 27 |
以下为推荐配置,实际需求请根据业务场景调整:
|
| 28 |
|
| 29 |
-
- 96G x4 GPU:支持 40 万 token
|
| 30 |
|
| 31 |
-
- 144G x8 GPU:支持长达 300 万 token
|
| 32 |
|
| 33 |
## 使用 Python 部署
|
| 34 |
|
|
|
|
| 26 |
|
| 27 |
以下为推荐配置,实际需求请根据业务场景调整:
|
| 28 |
|
| 29 |
+
- 96G x4 GPU:支持 40 万 token 的总上下文(每个序列最多 19.6 万个)。
|
| 30 |
|
| 31 |
+
- 144G x8 GPU:支持长达 300 万 token 的总上下文(每个序列最多 19.6 万个)。
|
| 32 |
|
| 33 |
## 使用 Python 部署
|
| 34 |
|