Q8 result: thrashes on 16 GB (CPU_REPACK doubles memory)
Browse files
README.md
CHANGED
|
@@ -127,7 +127,7 @@ Google Gemma 4 has 128 experts with top-8 routing (4B active of 26B total). Test
|
|
| 127 |
| M2 MacBook Air | IQ2_M | 9.3 GB | 8 GB | **1.37 tok/s** | Model exceeds RAM, MoE sparsity prevents thrash |
|
| 128 |
| M4 Mac Mini | IQ2_M | 9.3 GB | 16 GB | **36.5 tok/s** | Fits in RAM, full GPU speed |
|
| 129 |
| M4 Mac Mini | Q4_K_M | 16.9 GB | 16 GB | **5.18 tok/s** | Exceeds RAM, still runs smoothly |
|
| 130 |
-
| M4 Mac Mini | Q8_0 |
|
| 131 |
|
| 132 |
All results: stock llama.cpp with mmap, no madvise. Canberra verified on all configs.
|
| 133 |
|
|
|
|
| 127 |
| M2 MacBook Air | IQ2_M | 9.3 GB | 8 GB | **1.37 tok/s** | Model exceeds RAM, MoE sparsity prevents thrash |
|
| 128 |
| M4 Mac Mini | IQ2_M | 9.3 GB | 16 GB | **36.5 tok/s** | Fits in RAM, full GPU speed |
|
| 129 |
| M4 Mac Mini | Q4_K_M | 16.9 GB | 16 GB | **5.18 tok/s** | Exceeds RAM, still runs smoothly |
|
| 130 |
+
| M4 Mac Mini | Q8_0 | 26.9 GB | 16 GB | **0 tok/s (thrash)** | CPU_REPACK doubles memory to 51 GB, can't load |
|
| 131 |
|
| 132 |
All results: stock llama.cpp with mmap, no madvise. Canberra verified on all configs.
|
| 133 |
|