The Strong Lottery Ticket Hypothesis for Multi-Head Attention Mechanisms Paper • 2511.04217 • Published Nov 6 • 16
Rethinking Optimal Verification Granularity for Compute-Efficient Test-Time Scaling Paper • 2505.11730 • Published May 16 • 5