FlagEval Findings Report: A Preliminary Evaluation of Large Reasoning Models on Automatically Verifiable Textual and Visual Questions Paper • 2509.17177 • Published Sep 21, 2025 • 13
BlendX: Complex Multi-Intent Detection with Blended Patterns Paper • 2403.18277 • Published Mar 27, 2024 • 1