nvidia
/

MiniMax-M3-NVFP4

@@ -116,7 +116,7 @@ This model is NVFP4 quantized with nvidia-modelopt **v0.44.0**
 This model was obtained by quantizing the weights and activations of Minimax-M3 to NVFP4 data type. This optimization reduces the number of bits per parameter from 8 to 4, reducing disk size and GPU memory requirements by approximately 2x.
 ## Usage
-To serve this checkpoint with [vLLM](https://github.com/vllm-project/vllm), you currently need the nightly docker image that includes MiniMax-M3 NVFP4 support from [vllm-project/vllm#46380](https://github.com/vllm-project/vllm/pull/46380) (not yet in a stable release). Launch the nightly image and run the sample command below:
 ```
 vllm serve nvidia/MiniMax-M3-NVFP4 \

 This model was obtained by quantizing the weights and activations of Minimax-M3 to NVFP4 data type. This optimization reduces the number of bits per parameter from 8 to 4, reducing disk size and GPU memory requirements by approximately 2x.
 ## Usage
+To serve this checkpoint with [vLLM](https://github.com/vllm-project/vllm), you currently need the nightly docker image. Launch the nightly image and run the sample command below:
 ```
 vllm serve nvidia/MiniMax-M3-NVFP4 \