marksverdhei commited on
Commit
fa3d478
·
verified ·
1 Parent(s): 3eb2c0b

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +3 -5
README.md CHANGED
@@ -29,10 +29,8 @@ GGUF quantizations of [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/
29
 
30
  | Quantization | Size | Description |
31
  |-------------|------|-------------|
32
- | Q8_0 | ~227 GB | 8-bit quantization, highest quality |
33
- | Q4_K_M | — | 4-bit K-quant (medium), good balance of quality and size |
34
- | IQ3_S | — | 3-bit importance quantization (small), compact |
35
- | Q2_K | — | 2-bit K-quant, smallest size |
36
 
37
  ## Usage
38
 
@@ -46,5 +44,5 @@ llama-cli -m MiniMax-M2.5-Q4_K_M.gguf -p "Hello" -n 128
46
  ## Notes
47
 
48
  - The source model uses FP8 (`float8_e4m3fn`) precision, so Q8_0 is effectively lossless relative to the source weights.
49
- - This is a large MoE model. Even the smallest quant (Q2_K) requires significant memory due to the number of experts.
50
  - Quantized from the official [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) weights.
 
29
 
30
  | Quantization | Size | Description |
31
  |-------------|------|-------------|
32
+ | Q8_0 | 227 GB | 8-bit quantization, highest quality |
33
+ | Q4_K_M | 129 GB | 4-bit K-quant (medium), good balance of quality and size |
 
 
34
 
35
  ## Usage
36
 
 
44
  ## Notes
45
 
46
  - The source model uses FP8 (`float8_e4m3fn`) precision, so Q8_0 is effectively lossless relative to the source weights.
47
+ - This is a large MoE model. Even Q4_K_M requires ~129GB due to the number of experts.
48
  - Quantized from the official [MiniMaxAI/MiniMax-M2.5](https://huggingface.co/MiniMaxAI/MiniMax-M2.5) weights.