Add Gemma 4 benchmark results
Browse files
README.md
CHANGED
|
@@ -118,6 +118,21 @@ The cache was the experiment. madvise was the answer.
|
|
| 118 |
| madvise prefetch | 0.57 | Explicit kernel prefetch hints |
|
| 119 |
| LRU cache (5 GB) | 0.24 | Duplicate data in user-space heap |
|
| 120 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 121 |
## Related
|
| 122 |
|
| 123 |
- **MLX Expert Sniper** (Apple Silicon, 5.4 tok/s on 35B): [huggingface.co/waltgrace/mlx-expert-sniper](https://huggingface.co/waltgrace/mlx-expert-sniper)
|
|
|
|
| 118 |
| madvise prefetch | 0.57 | Explicit kernel prefetch hints |
|
| 119 |
| LRU cache (5 GB) | 0.24 | Duplicate data in user-space heap |
|
| 120 |
|
| 121 |
+
## Gemma 4-26B-A4B — MoE Sparsity Benchmark
|
| 122 |
+
|
| 123 |
+
Google Gemma 4 has 128 experts with top-8 routing (4B active of 26B total). Tested at multiple quantization levels on Apple Silicon:
|
| 124 |
+
|
| 125 |
+
| Hardware | Quant | Model size | RAM | Speed | Notes |
|
| 126 |
+
|----------|-------|-----------|-----|-------|-------|
|
| 127 |
+
| M2 MacBook Air | IQ2_M | 9.3 GB | 8 GB | **1.37 tok/s** | Model exceeds RAM, MoE sparsity prevents thrash |
|
| 128 |
+
| M4 Mac Mini | IQ2_M | 9.3 GB | 16 GB | **36.5 tok/s** | Fits in RAM, full GPU speed |
|
| 129 |
+
| M4 Mac Mini | Q4_K_M | 16.9 GB | 16 GB | **5.18 tok/s** | Exceeds RAM, still runs smoothly |
|
| 130 |
+
| M4 Mac Mini | Q8_0 | ~27 GB | 16 GB | *testing* | 1.7x oversubscription |
|
| 131 |
+
|
| 132 |
+
All results: stock llama.cpp with mmap, no madvise. Canberra verified on all configs.
|
| 133 |
+
|
| 134 |
+
**Finding:** Gemma 4's low activation ratio (15.4%) lets the OS page cache handle memory pressure without explicit madvise. The madvise sniper is most valuable for denser MoE models (Qwen 35B) where the per-token working set overwhelms the page cache.
|
| 135 |
+
|
| 136 |
## Related
|
| 137 |
|
| 138 |
- **MLX Expert Sniper** (Apple Silicon, 5.4 tok/s on 35B): [huggingface.co/waltgrace/mlx-expert-sniper](https://huggingface.co/waltgrace/mlx-expert-sniper)
|