Rethinking Composed Image Retrieval Evaluation: A Fine-Grained Benchmark from Image Editing Paper • 2601.16125 • Published 10 days ago • 13
ArenaRL: Scaling RL for Open-Ended Agents via Tournament-based Relative Ranking Paper • 2601.06487 • Published 22 days ago • 52
InfiniDepth: Arbitrary-Resolution and Fine-Grained Depth Estimation with Neural Implicit Fields Paper • 2601.03252 • Published 26 days ago • 101
Towards Universal Video Retrieval: Generalizing Video Embedding via Synthesized Multimodal Pyramid Curriculum Paper • 2510.27571 • Published Oct 31, 2025 • 18
LimRank: Less is More for Reasoning-Intensive Information Reranking Paper • 2510.23544 • Published Oct 27, 2025 • 9
Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences Paper • 2510.23451 • Published Oct 27, 2025 • 28
E^2Rank: Your Text Embedding can Also be an Effective and Efficient Listwise Reranker Paper • 2510.22733 • Published Oct 26, 2025 • 32
MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark for Reasoning-Intensive Multimodal Retrieval Paper • 2510.09510 • Published Oct 10, 2025 • 8
ReSum: Unlocking Long-Horizon Search Intelligence via Context Summarization Paper • 2509.13313 • Published Sep 16, 2025 • 80
Towards General Agentic Intelligence via Environment Scaling Paper • 2509.13311 • Published Sep 16, 2025 • 71
WebSailor-V2: Bridging the Chasm to Proprietary Agents via Synthetic Data and Scalable Reinforcement Learning Paper • 2509.13305 • Published Sep 16, 2025 • 91
WebResearcher: Unleashing unbounded reasoning capability in Long-Horizon Agents Paper • 2509.13309 • Published Sep 16, 2025 • 67
WebWeaver: Structuring Web-Scale Evidence with Dynamic Outlines for Open-Ended Deep Research Paper • 2509.13312 • Published Sep 16, 2025 • 105
WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent Paper • 2508.05748 • Published Aug 7, 2025 • 141
WebShaper: Agentically Data Synthesizing via Information-Seeking Formalization Paper • 2507.15061 • Published Jul 20, 2025 • 60
AbGen: Evaluating Large Language Models in Ablation Study Design and Evaluation for Scientific Research Paper • 2507.13300 • Published Jul 17, 2025 • 20
Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers Paper • 2507.06223 • Published Jul 8, 2025 • 14
Can LLMs Identify Critical Limitations within Scientific Research? A Systematic Evaluation on AI Research Papers Paper • 2507.02694 • Published Jul 3, 2025 • 19
SciArena: An Open Evaluation Platform for Foundation Models in Scientific Literature Tasks Paper • 2507.01001 • Published Jul 1, 2025 • 46