PRISM: Pushing the Frontier of Deep Think via Process Reward Model-Guided Inference Paper • 2603.02479 • Published 11 days ago • 18
FinDVer: Explainable Claim Verification over Long and Hybrid-Content Financial Documents Paper • 2411.05764 • Published Nov 8, 2024
AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research Paper • 2507.13300 • Published Jul 17, 2025 • 20
SealQA: Raising the Bar for Reasoning in Search-Augmented Language Models Paper • 2506.01062 • Published Jun 1, 2025 • 5
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published Jan 21, 2025 • 84