waltgrace
/

llama-cpp-expert-sniper

@@ -127,7 +127,7 @@ Google Gemma 4 has 128 experts with top-8 routing (4B active of 26B total). Test
 | M2 MacBook Air | IQ2_M | 9.3 GB | 8 GB | **1.37 tok/s** | Model exceeds RAM, MoE sparsity prevents thrash |
 | M4 Mac Mini | IQ2_M | 9.3 GB | 16 GB | **36.5 tok/s** | Fits in RAM, full GPU speed |
 | M4 Mac Mini | Q4_K_M | 16.9 GB | 16 GB | **5.18 tok/s** | Exceeds RAM, still runs smoothly |
-| M4 Mac Mini | Q8_0 | ~27 GB | 16 GB | *testing* | 1.7x oversubscription |
 All results: stock llama.cpp with mmap, no madvise. Canberra verified on all configs.

 | M2 MacBook Air | IQ2_M | 9.3 GB | 8 GB | **1.37 tok/s** | Model exceeds RAM, MoE sparsity prevents thrash |
 | M4 Mac Mini | IQ2_M | 9.3 GB | 16 GB | **36.5 tok/s** | Fits in RAM, full GPU speed |
 | M4 Mac Mini | Q4_K_M | 16.9 GB | 16 GB | **5.18 tok/s** | Exceeds RAM, still runs smoothly |
+| M4 Mac Mini | Q8_0 | 26.9 GB | 16 GB | **0 tok/s (thrash)** | CPU_REPACK doubles memory to 51 GB, can't load |
 All results: stock llama.cpp with mmap, no madvise. Canberra verified on all configs.