Point to M2.5 not M2.1

Files changed (1) hide show

README.md CHANGED Viewed

@@ -71,7 +71,7 @@ base_model:
 - MiniMaxAI/MiniMax-M2.5
 ---
-# MiniMax M2.1 (Mixed-Precision BF16 + INT4 AWQ)
 ## Changelog
@@ -83,7 +83,7 @@ base_model:
 This strives to be the highest quality quant that can run on 192GiB VRAM
 > [!TIP]
-> 💡This is a sister model to [mratsim/MiniMax-M2.5-FP8-INT4-AWQ](https://huggingface.co/mratsim/MiniMax-M2.1-FP8-INT4-AWQ)
 > with the original model FP8 weights pre-dequantized to BF16.
 >
 > This makes it compatible with 8x3090 systems (which don't have hardware FP8)
@@ -139,7 +139,7 @@ It uses my new declarative quantization framework https://github.com/mratsim/qua
 The model was tested with SGLang + 2x RTX Pro 6000, here is a script suitable for such configuration with the maximum 196,608 context length. This uses 92.5GiB of VRAM with the flashinfer backend.
-Please refer to [mratsim/MiniMax-M2.5-FP8-INT4-AWQ#running-script](https://huggingface.co/mratsim/MiniMax-M2.1-FP8-INT4-AWQ#running-script)
 for running it in vLLM
 ### Running script

 - MiniMaxAI/MiniMax-M2.5
 ---
+# MiniMax M2.5 (Mixed-Precision BF16 + INT4 AWQ)
 ## Changelog
 This strives to be the highest quality quant that can run on 192GiB VRAM
 > [!TIP]
+> 💡This is a sister model to [mratsim/MiniMax-M2.5-FP8-INT4-AWQ](https://huggingface.co/mratsim/MiniMax-M2.5-FP8-INT4-AWQ)
 > with the original model FP8 weights pre-dequantized to BF16.
 >
 > This makes it compatible with 8x3090 systems (which don't have hardware FP8)
 The model was tested with SGLang + 2x RTX Pro 6000, here is a script suitable for such configuration with the maximum 196,608 context length. This uses 92.5GiB of VRAM with the flashinfer backend.
+Please refer to [mratsim/MiniMax-M2.5-FP8-INT4-AWQ#running-script](https://huggingface.co/mratsim/MiniMax-M2.5-FP8-INT4-AWQ#running-script)
 for running it in vLLM
 ### Running script