FoolDev Claude Opus 4.7 commited on
Commit
bec5589
·
1 Parent(s): 1b15737

README: back the "27B slower per token" claim with measured numbers

Browse files

The "Why a 27B variant?" prose claimed the dense 27B is slower per
token than the MoE 35B-A3B without showing receipts. Measured both on
the same Ryzen AI Max+ 395 / Radeon 8060S iGPU via make bench:

Janus-27B Q3_K_S: ~10 tok/s
Janus-35B ~Q4: ~27 tok/s

That ~2.7x ratio matches what you'd expect from ~3B active vs 27B
active params. Inlined the data point so the claim has a citable
number right where it's made.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -81,7 +81,7 @@ this architecture upstream (see [Vision](#vision)).
81
 
82
  The 35B-A3B is a sparse mixture-of-experts model: 35B parameters total but only ~3B active per token. That makes it fast at inference but **memory-hungry at load time** — the full 35B has to live in VRAM/RAM even though only 3B is doing useful work each step.
83
 
84
- The 27B is **dense**: every parameter participates in every forward pass. It's slower per token than 35B-A3B (no sparse advantage), but the working set fits comfortably on commodity GPUs and avoids the MoE-specific load-balance failure modes.
85
 
86
  | | Janus-27B (this) | [Janus-35B](https://huggingface.co/FoolDev/janus) |
87
  |---|---|---|
 
81
 
82
  The 35B-A3B is a sparse mixture-of-experts model: 35B parameters total but only ~3B active per token. That makes it fast at inference but **memory-hungry at load time** — the full 35B has to live in VRAM/RAM even though only 3B is doing useful work each step.
83
 
84
+ The 27B is **dense**: every parameter participates in every forward pass. It's slower per token than 35B-A3B — on a Ryzen AI Max+ 395 / Radeon 8060S iGPU the dense 27B at Q3_K_S clocks ~10 tok/s, versus ~27 tok/s for the MoE 35B at ~Q4 (`make bench`, 3-prompt mix) but the working set fits comfortably on commodity GPUs and avoids the MoE-specific load-balance failure modes.
85
 
86
  | | Janus-27B (this) | [Janus-35B](https://huggingface.co/FoolDev/janus) |
87
  |---|---|---|