MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants Paper • 2603.09652 • Published 3 days ago • 12
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing Paper • 2603.09877 • Published 3 days ago • 36
SWE-rebench V2: Language-Agnostic SWE Task Collection at Scale Paper • 2602.23866 • Published 14 days ago • 83
Search More, Think Less: Rethinking Long-Horizon Agentic Search for Efficiency and Generalization Paper • 2602.22675 • Published 15 days ago • 22
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Paper • 2602.12670 • Published 28 days ago • 54
REDSearcher: A Scalable and Cost-Efficient Framework for Long-Horizon Search Agents Paper • 2602.14234 • Published 26 days ago • 26
EcoGym: Evaluating LLMs for Long-Horizon Plan-and-Execute in Interactive Economies Paper • 2602.09514 • Published Feb 10 • 10
Step 3.5 Flash: Open Frontier-Level Intelligence with 11B Active Parameters Paper • 2602.10604 • Published about 1 month ago • 189
TimeChat-Captioner: Scripting Multi-Scene Videos with Time-Aware and Structural Audio-Visual Captions Paper • 2602.08711 • Published Feb 9 • 28
OPUS: Towards Efficient and Principled Data Selection in Large Language Model Pre-training in Every Iteration Paper • 2602.05400 • Published Feb 5 • 347
OmniVideo-R1: Reinforcing Audio-visual Reasoning with Query Intention and Modality Attention Paper • 2602.05847 • Published Feb 5 • 12
Retrieval-Infused Reasoning Sandbox: A Benchmark for Decoupling Retrieval and Reasoning Capabilities Paper • 2601.21937 • Published Jan 29 • 19
OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models Paper • 2602.04804 • Published Feb 4 • 46
Vibe AIGC: A New Paradigm for Content Generation via Agentic Orchestration Paper • 2602.04575 • Published Feb 4 • 17
Vibe AIGC: A New Paradigm for Content Generation via Agentic Orchestration Paper • 2602.04575 • Published Feb 4 • 17
SWE-Universe: Scale Real-World Verifiable Environments to Millions Paper • 2602.02361 • Published Feb 2 • 60