waltgrace commited on
Commit
8270847
·
verified ·
1 Parent(s): ae909ec

Q8 result: thrashes on 16 GB (CPU_REPACK doubles memory)

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -127,7 +127,7 @@ Google Gemma 4 has 128 experts with top-8 routing (4B active of 26B total). Test
127
  | M2 MacBook Air | IQ2_M | 9.3 GB | 8 GB | **1.37 tok/s** | Model exceeds RAM, MoE sparsity prevents thrash |
128
  | M4 Mac Mini | IQ2_M | 9.3 GB | 16 GB | **36.5 tok/s** | Fits in RAM, full GPU speed |
129
  | M4 Mac Mini | Q4_K_M | 16.9 GB | 16 GB | **5.18 tok/s** | Exceeds RAM, still runs smoothly |
130
- | M4 Mac Mini | Q8_0 | ~27 GB | 16 GB | *testing* | 1.7x oversubscription |
131
 
132
  All results: stock llama.cpp with mmap, no madvise. Canberra verified on all configs.
133
 
 
127
  | M2 MacBook Air | IQ2_M | 9.3 GB | 8 GB | **1.37 tok/s** | Model exceeds RAM, MoE sparsity prevents thrash |
128
  | M4 Mac Mini | IQ2_M | 9.3 GB | 16 GB | **36.5 tok/s** | Fits in RAM, full GPU speed |
129
  | M4 Mac Mini | Q4_K_M | 16.9 GB | 16 GB | **5.18 tok/s** | Exceeds RAM, still runs smoothly |
130
+ | M4 Mac Mini | Q8_0 | 26.9 GB | 16 GB | **0 tok/s (thrash)** | CPU_REPACK doubles memory to 51 GB, can't load |
131
 
132
  All results: stock llama.cpp with mmap, no madvise. Canberra verified on all configs.
133