Mitigating Safety Tax via Distribution-Grounded Refinement in Large Reasoning Models Paper • 2602.02136 • Published 2 days ago • 2
UltraHorizon: Benchmarking Agent Capabilities in Ultra Long-Horizon Scenarios Paper • 2509.21766 • Published Sep 26, 2025 • 24