Fix critical-path bandwidth figure: naive is 9.00 MB/token (not 16.50); 9.00/4.50=2.00x
Browse files
README.md
CHANGED
|
@@ -57,7 +57,7 @@ Rust.
|
|
| 57 |
|---|---|
|
| 58 |
| CPU decode throughput (~31B / 16-layer, Q4, 32 threads) | **5.94 tok/s** |
|
| 59 |
| Effective memory bandwidth | 61 GB/s (30% of 204.8 GB/s peak) |
|
| 60 |
-
| Bandwidth reduction from pipelining | **2.00x** (
|
| 61 |
| Test perplexity (114M, TinyStories, 10K steps) | 6.50 |
|
| 62 |
| Val perplexity (8.34B / 4-layer, TinyStories, 10K steps) | 4.52 |
|
| 63 |
|
|
|
|
| 57 |
|---|---|
|
| 58 |
| CPU decode throughput (~31B / 16-layer, Q4, 32 threads) | **5.94 tok/s** |
|
| 59 |
| Effective memory bandwidth | 61 GB/s (30% of 204.8 GB/s peak) |
|
| 60 |
+
| Bandwidth reduction from pipelining | **2.00x** (9.00 → 4.50 MB/token) |
|
| 61 |
| Test perplexity (114M, TinyStories, 10K steps) | 6.50 |
|
| 62 |
| Val perplexity (8.34B / 4-layer, TinyStories, 10K steps) | 4.52 |
|
| 63 |
|