keep-warm pulse: hold GPU boost clock while typing so casual chat decodes at boosted rate 70592b3 verified Humuhumu33 commited on about 2 hours ago
warmup uses decode() to precompile batched pipelines (cut first-msg TTFT) e4b6a8f verified Humuhumu33 commited on about 3 hours ago
fast first turn: warmup boosts clock + primes system-prompt KV (12s TTFT to ~0.5s) + static grounded greeting 7efaf4b verified Humuhumu33 commited on about 3 hours ago
discrete-GPU validation ?bench=discrete: bandwidth + live + spec-flip in one page dabcb0d verified Humuhumu33 commited on about 3 hours ago
per-pass GPU trace ?bench=trace: name the non-weight overhead 770541b verified Humuhumu33 commited on about 4 hours ago
spec bench: warmup + short prompt + 192-tok decode-dominated (fix prefill confound) 439999f verified Humuhumu33 commited on about 4 hours ago
live decode profile ?bench=perf: boosted-clock steady tok/s vs roofline e74320a verified Humuhumu33 commited on about 4 hours ago
spec-decode for BitNet: subNorm+f32-KV batched verify, ?spec + ?bench=spec 7406205 verified Humuhumu33 commited on about 6 hours ago