kaihangj Claude Opus 4.6 (1M context) commited on
Commit
bc1518d
·
1 Parent(s): 6c40e0e

Update README.md: add vLLM as supported runtime engine

Browse files

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Files changed (1) hide show
  1. README.md +3 -2
README.md CHANGED
@@ -59,6 +59,7 @@ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated sys
59
 
60
  ## Software Integration:
61
  **Supported Runtime Engine(s):** <br>
 
62
  * SGLang <br>
63
 
64
  **Supported Hardware Microarchitecture Compatibility:** <br>
@@ -98,11 +99,11 @@ The model version is NVFP4 1.0 version and is quantized with nvidia-modelopt **v
98
 
99
 
100
  ## Inference:
101
- **Acceleration Engine:** SGLang <br>
102
  **Test Hardware:** B300 <br>
103
 
104
  ## Post Training Quantization
105
- This model was obtained by quantizing the weights and activations of GLM-5 to NVFP4 data type, ready for inference with SGLang. Only the weights and activations of the linear operators within transformer blocks in MoE are quantized.
106
 
107
  ## Usage
108
 
 
59
 
60
  ## Software Integration:
61
  **Supported Runtime Engine(s):** <br>
62
+ * vLLM <br>
63
  * SGLang <br>
64
 
65
  **Supported Hardware Microarchitecture Compatibility:** <br>
 
99
 
100
 
101
  ## Inference:
102
+ **Acceleration Engine:** vLLM, SGLang <br>
103
  **Test Hardware:** B300 <br>
104
 
105
  ## Post Training Quantization
106
+ This model was obtained by quantizing the weights and activations of GLM-5 to NVFP4 data type, ready for inference with vLLM and SGLang. Only the weights and activations of the linear operators within transformer blocks in MoE are quantized.
107
 
108
  ## Usage
109