windniw commited on
Commit
69af314
·
verified ·
1 Parent(s): 927ea2b

Clarify supported context length in docs (#21)

Browse files

- update: clearify supported context length in docs (7d718af0cb9fc1c12440bc9912e5abcb42067873)
- update (67e71adcb7d24eadc6f88a02c6665ef3634fe197)

docs/sglang_deploy_guide.md CHANGED
@@ -27,9 +27,11 @@ The deployment process is illustrated below using MiniMax-M2.1 as an example.
27
 
28
  The following are recommended configurations; actual requirements should be adjusted based on your use case:
29
 
30
- - 4x 96GB GPUs: Supported context length of up to 400K tokens.
31
 
32
- - 8x 144GB GPUs: Supported context length of up to 3M tokens.
 
 
33
 
34
  ## Deployment with Python
35
 
 
27
 
28
  The following are recommended configurations; actual requirements should be adjusted based on your use case:
29
 
30
+ - **96G x4** GPU: Supports a total KV Cache capacity of 400K tokens.
31
 
32
+ - **144G x8** GPU: Supports a total KV Cache capacity of up to 3M tokens.
33
+
34
+ > **Note**: The values above represent the total aggregate hardware KV Cache capacity. The maximum context length per individual sequence remains **196K** tokens.
35
 
36
  ## Deployment with Python
37
 
docs/sglang_deploy_guide_cn.md CHANGED
@@ -27,9 +27,11 @@
27
 
28
  以下为推荐配置,实际需求请根据业务场景调整:
29
 
30
- - 96G x4 GPU:支持 40 万 token 的总上下文。
31
 
32
- - 144G x8 GPU:支持长达 300 万 token 的总上下文。
 
 
33
 
34
  ## 使用 Python 部署
35
 
 
27
 
28
  以下为推荐配置,实际需求请根据业务场景调整:
29
 
30
+ - **96G x4 GPU**:总 KV Cache 容量支持 40 万 token
31
 
32
+ - **144G x8 GPU**:总 KV Cache 容量支持高达 300 万 token
33
+
34
+ > **注**:以上数值为硬件支持的最大并发缓存总量,模型单序列(Single Sequence)长度上限仍为 196k。
35
 
36
  ## 使用 Python 部署
37
 
docs/vllm_deploy_guide.md CHANGED
@@ -26,9 +26,11 @@ The deployment process is illustrated below using MiniMax-M2.1 as an example.
26
 
27
  The following are recommended configurations; actual requirements should be adjusted based on your use case:
28
 
29
- - 4x 96GB GPUs: Supported context length of up to 400K tokens.
30
 
31
- - 8x 144GB GPUs: Supported context length of up to 3M tokens.
 
 
32
 
33
  ## Deployment with Python
34
 
 
26
 
27
  The following are recommended configurations; actual requirements should be adjusted based on your use case:
28
 
29
+ - **96G x4** GPU: Supports a total KV Cache capacity of 400K tokens.
30
 
31
+ - **144G x8** GPU: Supports a total KV Cache capacity of up to 3M tokens.
32
+
33
+ > **Note**: The values above represent the total aggregate hardware KV Cache capacity. The maximum context length per individual sequence remains **196K** tokens.
34
 
35
  ## Deployment with Python
36
 
docs/vllm_deploy_guide_cn.md CHANGED
@@ -26,9 +26,11 @@
26
 
27
  以下为推荐配置,实际需求请根据业务场景调整:
28
 
29
- - 96G x4 GPU:支持 40 万 token 的总上下文。
30
 
31
- - 144G x8 GPU:支持长达 300 万 token 的总上下文。
 
 
32
 
33
  ## 使用 Python 部署
34
 
 
26
 
27
  以下为推荐配置,实际需求请根据业务场景调整:
28
 
29
+ - **96G x4 GPU**:总 KV Cache 容量支持 40 万 token
30
 
31
+ - **144G x8 GPU**:总 KV Cache 容量支持高达 300 万 token
32
+
33
+ > **注**:以上数值为硬件支持的最大并发缓存总量,模型单序列(Single Sequence)长度上限仍为 196k。
34
 
35
  ## 使用 Python 部署
36