nvidia
/

GLM-5-NVFP4

@@ -59,6 +59,7 @@ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated sys
 ## Software Integration:
 **Supported Runtime Engine(s):** <br>
 * SGLang <br>
 **Supported Hardware Microarchitecture Compatibility:** <br>
@@ -98,11 +99,11 @@ The model version is NVFP4 1.0 version and is quantized with nvidia-modelopt **v
 ## Inference:
-**Acceleration Engine:** SGLang <br>
 **Test Hardware:** B300 <br>
 ## Post Training Quantization
-This model was obtained by quantizing the weights and activations of GLM-5 to NVFP4 data type, ready for inference with SGLang. Only the weights and activations of the linear operators within transformer blocks in MoE are quantized.
 ## Usage

 ## Software Integration:
 **Supported Runtime Engine(s):** <br>
+* vLLM <br>
 * SGLang <br>
 **Supported Hardware Microarchitecture Compatibility:** <br>
 ## Inference:
+**Acceleration Engine:** vLLM, SGLang <br>
 **Test Hardware:** B300 <br>
 ## Post Training Quantization
+This model was obtained by quantizing the weights and activations of GLM-5 to NVFP4 data type, ready for inference with vLLM and SGLang. Only the weights and activations of the linear operators within transformer blocks in MoE are quantized.
 ## Usage