Upload README.md with huggingface_hub
Browse files
README.md
CHANGED
|
@@ -29,10 +29,8 @@ GGUF quantizations of [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/
|
|
| 29 |
|
| 30 |
| Quantization | Size | Description |
|
| 31 |
|-------------|------|-------------|
|
| 32 |
-
| Q8_0 |
|
| 33 |
-
| Q4_K_M |
|
| 34 |
-
| IQ3_S | — | 3-bit importance quantization (small), compact |
|
| 35 |
-
| Q2_K | — | 2-bit K-quant, smallest size |
|
| 36 |
|
| 37 |
## Usage
|
| 38 |
|
|
@@ -46,5 +44,5 @@ llama-cli -m MiniMax-M2.5-Q4_K_M.gguf -p "Hello" -n 128
|
|
| 46 |
## Notes
|
| 47 |
|
| 48 |
- The source model uses FP8 (`float8_e4m3fn`) precision, so Q8_0 is effectively lossless relative to the source weights.
|
| 49 |
-
- This is a large MoE model. Even
|
| 50 |
- Quantized from the official [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) weights.
|
|
|
|
| 29 |
|
| 30 |
| Quantization | Size | Description |
|
| 31 |
|-------------|------|-------------|
|
| 32 |
+
| Q8_0 | 227 GB | 8-bit quantization, highest quality |
|
| 33 |
+
| Q4_K_M | 129 GB | 4-bit K-quant (medium), good balance of quality and size |
|
|
|
|
|
|
|
| 34 |
|
| 35 |
## Usage
|
| 36 |
|
|
|
|
| 44 |
## Notes
|
| 45 |
|
| 46 |
- The source model uses FP8 (`float8_e4m3fn`) precision, so Q8_0 is effectively lossless relative to the source weights.
|
| 47 |
+
- This is a large MoE model. Even Q4_K_M requires ~129GB due to the number of experts.
|
| 48 |
- Quantized from the official [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) weights.
|