Point to M2.5 not M2.1
Browse files
README.md
CHANGED
|
@@ -71,7 +71,7 @@ base_model:
|
|
| 71 |
- MiniMaxAI/MiniMax-M2.5
|
| 72 |
---
|
| 73 |
|
| 74 |
-
# MiniMax M2.
|
| 75 |
|
| 76 |
## Changelog
|
| 77 |
|
|
@@ -83,7 +83,7 @@ base_model:
|
|
| 83 |
This strives to be the highest quality quant that can run on 192GiB VRAM
|
| 84 |
|
| 85 |
> [!TIP]
|
| 86 |
-
> 💡This is a sister model to [mratsim/MiniMax-M2.5-FP8-INT4-AWQ](https://huggingface.co/mratsim/MiniMax-M2.
|
| 87 |
> with the original model FP8 weights pre-dequantized to BF16.
|
| 88 |
>
|
| 89 |
> This makes it compatible with 8x3090 systems (which don't have hardware FP8)
|
|
@@ -139,7 +139,7 @@ It uses my new declarative quantization framework https://github.com/mratsim/qua
|
|
| 139 |
|
| 140 |
The model was tested with SGLang + 2x RTX Pro 6000, here is a script suitable for such configuration with the maximum 196,608 context length. This uses 92.5GiB of VRAM with the flashinfer backend.
|
| 141 |
|
| 142 |
-
Please refer to [mratsim/MiniMax-M2.5-FP8-INT4-AWQ#running-script](https://huggingface.co/mratsim/MiniMax-M2.
|
| 143 |
for running it in vLLM
|
| 144 |
|
| 145 |
### Running script
|
|
|
|
| 71 |
- MiniMaxAI/MiniMax-M2.5
|
| 72 |
---
|
| 73 |
|
| 74 |
+
# MiniMax M2.5 (Mixed-Precision BF16 + INT4 AWQ)
|
| 75 |
|
| 76 |
## Changelog
|
| 77 |
|
|
|
|
| 83 |
This strives to be the highest quality quant that can run on 192GiB VRAM
|
| 84 |
|
| 85 |
> [!TIP]
|
| 86 |
+
> 💡This is a sister model to [mratsim/MiniMax-M2.5-FP8-INT4-AWQ](https://huggingface.co/mratsim/MiniMax-M2.5-FP8-INT4-AWQ)
|
| 87 |
> with the original model FP8 weights pre-dequantized to BF16.
|
| 88 |
>
|
| 89 |
> This makes it compatible with 8x3090 systems (which don't have hardware FP8)
|
|
|
|
| 139 |
|
| 140 |
The model was tested with SGLang + 2x RTX Pro 6000, here is a script suitable for such configuration with the maximum 196,608 context length. This uses 92.5GiB of VRAM with the flashinfer backend.
|
| 141 |
|
| 142 |
+
Please refer to [mratsim/MiniMax-M2.5-FP8-INT4-AWQ#running-script](https://huggingface.co/mratsim/MiniMax-M2.5-FP8-INT4-AWQ#running-script)
|
| 143 |
for running it in vLLM
|
| 144 |
|
| 145 |
### Running script
|