OpenOOD v1.5: Enhanced Benchmark for Out-of-Distribution Detection Paper • 2306.09301 • Published Jun 15, 2023 • 1
Scattered Forest Search: Smarter Code Space Exploration with LLMs Paper • 2411.05010 • Published Oct 22, 2024 • 1
Climbing the Ladder of Reasoning: What LLMs Can-and Still Can't-Solve after SFT? Paper • 2504.11741 • Published Apr 16, 2025 • 1
OMEGA: Can LLMs Reason Outside the Box in Math? Evaluating Exploratory, Compositional, and Transformative Generalization Paper • 2506.18880 • Published Jun 23, 2025 • 4
Can Aha Moments Be Fake? Identifying True and Decorative Thinking Steps in Chain-of-Thought Paper • 2510.24941 • Published Oct 28, 2025 • 4
Unsafer in Many Turns: Benchmarking and Defending Multi-Turn Safety Risks in Tool-Using Agents Paper • 2602.13379 • Published Feb 13 • 3
Rethinking Domain Generalization for Face Anti-spoofing: Separability and Alignment Paper • 2303.13662 • Published Mar 23, 2023