MiniMax-M2.1-nvfp4

Format: NVFP4 (W4A4)
Base model: MiniMaxAI/MiniMax-M2.1 (via QuixiAI/MiniMax-M2.1-bf16)
Calibration: 256 samples @ 4096 from Rombo-Org/Optimized_Reasoning

Notes

As of now I've been unable to get this to run in VLLM. Using a similar setup as the other NVFP4 quant of MiniMax-M2.1 still fails to run. There's probably a trick to get this running but as of right now I don't know what it is. I'll probably open an issue on the VLLM github to see if they can give some guidance.

sudo docker run --runtime nvidia --gpus all -p 8000:8000 --ipc=host -e VLLM_USE_FLASHINFER_MOE_FP4=1 -e NCCL_IB_DISABLE=1 -e NCCL_NVLS_ENABLE=0 -e  NCCL_P2P_DISABLE=0 -e  NCCL_SHM_DISABLE=0 -e  VLLM_USE_V1=1 -e SAFETENSORS_FAST_GPU=1 vllm/vllm-openai:nightly-96142f209453a381fcaf9d9d010bbf8711119a77 --model Firworks/MiniMax-M2.1-nvfp4 --dtype auto --max-model-len 32768 --tensor-parallel-size 2 --trust_remote_code --enable-auto-tool-choice --tool-call-parser minimax_m2 --reasoning-parser minimax_m2_append_think --all2all-backend pplx --enable-expert-parallel

This was tested on an 2 x RTX Pro 6000 Blackwell cloud instance.

If there are other models you're interested in seeing quantized to NVFP4 for use on the DGX Spark, or other modern Blackwell (or newer) cards let me know. I'm trying to make more NVFP4 models available to allow more people to try them out.

Downloads last month: 4

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Firworks/MiniMax-M2.1-nvfp4

Base model

MiniMaxAI/MiniMax-M2.1

Quantized

(41)

this model

Firworks
/

MiniMax-M2.1-nvfp4

MiniMax-M2.1-nvfp4

Notes

Model tree for Firworks/MiniMax-M2.1-nvfp4

Dataset used to train Firworks/MiniMax-M2.1-nvfp4