Question regarding quantization hardware and modelopt sharding

#8
by Mario12355 - opened

Hi @lukealonso ,

Thanks for sharing this NVFP4 version of MiniMax-M2.5! I really appreciate the detailed model card and the immense effort that clearly went into the custom calibration process.

I have a technical question purely out of curiosity: Given the massive size of the original model (230B parameters), what kind of GPU setup did you use for the actual quantization process?

Also, could you share if the nvidia-modelopt recipe supports sharding across multiple GPUs during the quantization phase to handle the VRAM requirements?

Thanks again for your fantastic work and your time! Would love to hear from you :)

Sign up or log in to comment