Trust Functions: Near-Lossless Weak-to-Strong Generalization by Learning When to Trust the Weak Teacher Paper • 2606.01000 • Published 27 days ago • 6
XTR Replicability Collection All the models used in experiments from "A Replicability Study of XTR" • 16 items • Updated May 5 • 7
The Alignment Waltz: Jointly Training Agents to Collaborate for Safety Paper • 2510.08240 • Published Oct 9, 2025 • 41
The Flaw of Averages: Quantifying Uniformity of Performance on Benchmarks Paper • 2509.25671 • Published Sep 30, 2025 • 6