majentik commited on
Commit
cdd4eb2
·
verified ·
1 Parent(s): a21a709

docs: Tier 2 polish — variant matrix + quant trade-off

Browse files
Files changed (1) hide show
  1. README.md +46 -10
README.md CHANGED
@@ -2,18 +2,16 @@
2
  base_model: google/gemma-4-31B
3
  library_name: mlx
4
  tags:
5
- - rotorquant
6
- - kv-cache-quantization
7
- - gemma
8
- - gemma4
9
- - multimodal
10
- - quantized
11
- - mlx
12
- - 2bit
13
  license: apache-2.0
14
  pipeline_tag: image-text-to-text
15
- language:
16
- - en
17
  ---
18
 
19
  # Gemma 4 31B - RotorQuant MLX 2-bit
@@ -105,3 +103,41 @@ This model requires approximately 9 GB of unified memory. Recommended hardware:
105
  - [majentik/gemma-4-31B-TurboQuant-MLX-2bit](https://huggingface.co/majentik/gemma-4-31B-TurboQuant-MLX-2bit) -- TurboQuant MLX 2-bit variant
106
  - [RotorQuant GitHub](https://github.com/scrya-com/rotorquant)
107
  - [MLX Framework](https://github.com/ml-explore/mlx)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  base_model: google/gemma-4-31B
3
  library_name: mlx
4
  tags:
5
+ - rotorquant
6
+ - kv-cache-quantization
7
+ - gemma
8
+ - gemma4
9
+ - multimodal
10
+ - quantized
11
+ - mlx
12
+ - 2bit
13
  license: apache-2.0
14
  pipeline_tag: image-text-to-text
 
 
15
  ---
16
 
17
  # Gemma 4 31B - RotorQuant MLX 2-bit
 
103
  - [majentik/gemma-4-31B-TurboQuant-MLX-2bit](https://huggingface.co/majentik/gemma-4-31B-TurboQuant-MLX-2bit) -- TurboQuant MLX 2-bit variant
104
  - [RotorQuant GitHub](https://github.com/scrya-com/rotorquant)
105
  - [MLX Framework](https://github.com/ml-explore/mlx)
106
+
107
+ ## Quant trade-off (MLX lane)
108
+
109
+ | Bits | Approx size | Use case | Recommendation |
110
+ |---|---|---|---|
111
+ | **2-bit** | ~8.1 GB | Aggressive quantization | **Very low-RAM Macs** |
112
+ | 3-bit | ~11 GB | Lossy but small | Low-RAM Macs |
113
+ | 4-bit | ~13 GB | Balanced default | Recommended for most Macs |
114
+ | 5-bit | ~16 GB | Higher fidelity | Quality-sensitive |
115
+ | 6-bit | ~19 GB | Approaching FP16 quality | High-fidelity |
116
+ | 8-bit | ~24 GB | Near-lossless reference | Fidelity-critical work |
117
+
118
+ (Current variant — **2bit** — is bolded.)
119
+
120
+ ## Variants in this family
121
+
122
+ (Showing 18 sibling variants under `majentik/gemma4-31b-*`. The current variant — `RotorQuant-MLX-2bit` — is **bolded**.)
123
+
124
+ | Variant | Runtime | Approx size | Use case |
125
+ |---|---|---|---|
126
+ | [RotorQuant](https://huggingface.co/majentik/gemma4-31b-rotorquant) | runtime modifier | n/a | KV-cache root (weight-agnostic) |
127
+ | [RotorQuant-AWQ-4bit](https://huggingface.co/majentik/gemma4-31b-rotorquant-awq-4bit) | transformers | ~19 GB | GPU 4-bit (AutoAWQ) |
128
+ | [RotorQuant-AWQ-8bit](https://huggingface.co/majentik/gemma4-31b-rotorquant-awq-8bit) | transformers | ~34 GB | GPU 8-bit (AutoAWQ) |
129
+ | [RotorQuant-GGUF-IQ4_XS](https://huggingface.co/majentik/gemma4-31b-rotorquant-gguf-IQ4_XS) | llama.cpp | ~27 GB | Lossy 4-bit, low-RAM CPU/edge |
130
+ | [RotorQuant-GGUF-Q2_K](https://huggingface.co/majentik/gemma4-31b-rotorquant-gguf-Q2_K) | llama.cpp | ~19 GB | Lossy, low-RAM CPU/edge |
131
+ | [RotorQuant-GGUF-Q3_K_M](https://huggingface.co/majentik/gemma4-31b-rotorquant-gguf-Q3_K_M) | llama.cpp | ~24 GB | Smaller 3-bit, CPU-friendly |
132
+ | [RotorQuant-GGUF-Q4_K_M](https://huggingface.co/majentik/gemma4-31b-rotorquant-gguf-Q4_K_M) | llama.cpp | ~34 GB | Balanced default |
133
+ | [RotorQuant-GGUF-Q5_K_M](https://huggingface.co/majentik/gemma4-31b-rotorquant-gguf-Q5_K_M) | llama.cpp | ~41 GB | Higher fidelity, more RAM |
134
+ | [RotorQuant-GGUF-Q8_0](https://huggingface.co/majentik/gemma4-31b-rotorquant-gguf-Q8_0) | llama.cpp | ~65 GB | Near-lossless reference |
135
+ | **RotorQuant-MLX-2bit** | mlx-lm | ~9.9 GB | Apple Silicon, smallest |
136
+ | [RotorQuant-MLX-4bit](https://huggingface.co/majentik/gemma4-31b-rotorquant-mlx-4bit) | mlx-lm | ~19 GB | Apple Silicon balanced |
137
+ | [RotorQuant-MLX-8bit](https://huggingface.co/majentik/gemma4-31b-rotorquant-mlx-8bit) | mlx-lm | ~37 GB | Apple Silicon reference |
138
+ | [TurboQuant](https://huggingface.co/majentik/gemma4-31b-turboquant) | runtime modifier | n/a | KV-cache root (weight-agnostic) |
139
+ | [TurboQuant-AWQ-4bit](https://huggingface.co/majentik/gemma4-31b-turboquant-awq-4bit) | transformers | ~19 GB | GPU 4-bit (AutoAWQ) |
140
+ | [TurboQuant-AWQ-8bit](https://huggingface.co/majentik/gemma4-31b-turboquant-awq-8bit) | transformers | ~34 GB | GPU 8-bit (AutoAWQ) |
141
+ | [TurboQuant-MLX-2bit](https://huggingface.co/majentik/gemma4-31b-turboquant-mlx-2bit) | mlx-lm | ~9.9 GB | Apple Silicon, smallest |
142
+ | [TurboQuant-MLX-4bit](https://huggingface.co/majentik/gemma4-31b-turboquant-mlx-4bit) | mlx-lm | ~19 GB | Apple Silicon balanced |
143
+ | [TurboQuant-MLX-8bit](https://huggingface.co/majentik/gemma4-31b-turboquant-mlx-8bit) | mlx-lm | ~37 GB | Apple Silicon reference |