per-pass GPU trace ?bench=trace: name the non-weight overhead 31a05bf verified Humuhumu33 commited on about 11 hours ago
spec-decode for BitNet: subNorm+f32-KV batched verify, ?spec + ?bench=spec 459725d verified Humuhumu33 commited on about 14 hours ago