Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification Paper • 2601.22642 • Published 4 days ago • 8
Pushing the Boundaries of Natural Reasoning: Interleaved Bonus from Formal-Logic Verification Paper • 2601.22642 • Published 4 days ago • 8
MedInsightBench: Evaluating Medical Analytics Agents Through Multi-Step Insight Discovery in Multimodal Medical Data Paper • 2512.13297 • Published Dec 15, 2025
Measuring Hong Kong Massive Multi-Task Language Understanding Paper • 2505.02177 • Published May 4, 2025 • 1
Towards Advanced Mathematical Reasoning for LLMs via First-Order Logic Theorem Proving Paper • 2506.17104 • Published Jun 20, 2025 • 2
SafeLawBench: Towards Safe Alignment of Large Language Models Paper • 2506.06636 • Published Jun 7, 2025 • 1