mratsim commited on
Commit
5b18eb5
·
verified ·
1 Parent(s): 7d33890

mention VRAM cost of BF16

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -76,7 +76,7 @@ This strives to be the highest quality quant that can run on 192GiB VRAM
76
 
77
  > [!TIP]
78
  > 💡 A non-FP8 version is available at [mratsim/MiniMax-M2.1-BF16-INT4-AWQ](https://huggingface.co/mratsim/MiniMax-M2.1-BF16-INT4-AWQ) \
79
- > That version is compatible with 8x RTX 3090s and with SGLang (which doesn't support mixed quantization yet) \
80
  > This FP8+INT4 AWQ was build by merging the original FP8 self-attention weights and [mratsim/MiniMax-M2.1-BF16-INT4-AWQ](https://huggingface.co/mratsim/MiniMax-M2.1-BF16-INT4-AWQ) experts.
81
 
82
  It features:
 
76
 
77
  > [!TIP]
78
  > 💡 A non-FP8 version is available at [mratsim/MiniMax-M2.1-BF16-INT4-AWQ](https://huggingface.co/mratsim/MiniMax-M2.1-BF16-INT4-AWQ) \
79
+ > That version is compatible with 8x RTX 3090s and with SGLang (which doesn't support mixed quantization yet) for an extra 3GiB in VRAM. \
80
  > This FP8+INT4 AWQ was build by merging the original FP8 self-attention weights and [mratsim/MiniMax-M2.1-BF16-INT4-AWQ](https://huggingface.co/mratsim/MiniMax-M2.1-BF16-INT4-AWQ) experts.
81
 
82
  It features: