mention VRAM cost of BF16
Browse files
README.md
CHANGED
|
@@ -76,7 +76,7 @@ This strives to be the highest quality quant that can run on 192GiB VRAM
|
|
| 76 |
|
| 77 |
> [!TIP]
|
| 78 |
> 💡 A non-FP8 version is available at [mratsim/MiniMax-M2.1-BF16-INT4-AWQ](https://huggingface.co/mratsim/MiniMax-M2.1-BF16-INT4-AWQ) \
|
| 79 |
-
> That version is compatible with 8x RTX 3090s and with SGLang (which doesn't support mixed quantization yet) \
|
| 80 |
> This FP8+INT4 AWQ was build by merging the original FP8 self-attention weights and [mratsim/MiniMax-M2.1-BF16-INT4-AWQ](https://huggingface.co/mratsim/MiniMax-M2.1-BF16-INT4-AWQ) experts.
|
| 81 |
|
| 82 |
It features:
|
|
|
|
| 76 |
|
| 77 |
> [!TIP]
|
| 78 |
> 💡 A non-FP8 version is available at [mratsim/MiniMax-M2.1-BF16-INT4-AWQ](https://huggingface.co/mratsim/MiniMax-M2.1-BF16-INT4-AWQ) \
|
| 79 |
+
> That version is compatible with 8x RTX 3090s and with SGLang (which doesn't support mixed quantization yet) for an extra 3GiB in VRAM. \
|
| 80 |
> This FP8+INT4 AWQ was build by merging the original FP8 self-attention weights and [mratsim/MiniMax-M2.1-BF16-INT4-AWQ](https://huggingface.co/mratsim/MiniMax-M2.1-BF16-INT4-AWQ) experts.
|
| 81 |
|
| 82 |
It features:
|