EPYC 9355 CPU-only sweep-bench
#6
by sousekd - opened
People on Reddit sometimes ask about EPYC CPU-only performance; my GPUs are currently out-of-order, so here are CPU-only results from a single Turin 9355 (12x DDR5-6400) running GLM-4.7-Flash IQ5_K with ik_llama.cpp:
./llama-sweep-bench \
--model "$MODEL_PATH" \
--no-mmap --merge-qkv \
-mla 3 -amb 512 \
-b 2048 -ub 1024 \
-ctk f16 -ctv f16 -c 131072 \
--threads 20 \
--threads-batch 30 \
--warmup-batch \
-n 128
| PP | TG | N_KV | T_PP s | S_PP t/s | T_TG s | S_TG t/s |
|---|---|---|---|---|---|---|
| 1024 | 128 | 0 | 1.118 | 916.26 | 1.650 | 77.58 |
| 1024 | 128 | 31744 | 11.758 | 87.09 | 4.543 | 28.18 |
| 1024 | 128 | 64512 | 22.428 | 45.66 | 7.834 | 16.34 |
| 1024 | 128 | 97280 | 33.221 | 30.82 | 11.212 | 11.42 |
| 1024 | 128 | 130048 | 43.154 | 23.73 | 14.239 | 8.99 |

