Question regarding quantization hardware and modelopt sharding

by Mario12355 - opened Feb 26

Feb 26

Thanks for sharing this NVFP4 version of MiniMax-M2.5! I really appreciate the detailed model card and the immense effort that clearly went into the custom calibration process.

I have a technical question purely out of curiosity: Given the massive size of the original model (230B parameters), what kind of GPU setup did you use for the actual quantization process?

Also, could you share if the nvidia-modelopt recipe supports sharding across multiple GPUs during the quantization phase to handle the VRAM requirements?

Thanks again for your fantastic work and your time! Would love to hear from you :)

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment