AI Idea Bench 2025: AI Research Idea Generation Benchmark Paper • 2504.14191 • Published Apr 19, 2025
MDK12-Bench: A Comprehensive Evaluation of Multimodal Large Language Models on Multidisciplinary Exams Paper • 2508.06851 • Published Aug 9, 2025
InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles Paper • 2508.16072 • Published Aug 22, 2025 • 4
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25, 2025 • 216
Symbolic Graphics Programming with Large Language Models Paper • 2509.05208 • Published Sep 5, 2025 • 47
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling Paper • 2509.12201 • Published Sep 15, 2025 • 107
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning Paper • 2511.01833 • Published Nov 3, 2025 • 16
InternSpatial: A Comprehensive Dataset for Spatial Reasoning in Vision-Language Models Paper • 2506.18385 • Published Jun 23, 2025
Dialogue as Discovery: Navigating Human Intent Through Principled Inquiry Paper • 2510.27410 • Published Oct 31, 2025
SVBench: Evaluation of Video Generation Models on Social Reasoning Paper • 2512.21507 • Published Dec 25, 2025 • 8
Yume-1.5: A Text-Controlled Interactive World Generation Model Paper • 2512.22096 • Published Dec 26, 2025 • 61
ProSoftArena: Benchmarking Hierarchical Capabilities of Multimodal Agents in Professional Software Environments Paper • 2601.02399 • Published Dec 30, 2025
Focal Guidance: Unlocking Controllability from Semantic-Weak Layers in Video Diffusion Models Paper • 2601.07287 • Published Jan 12 • 5
MeepleLM: A Virtual Playtester Simulating Diverse Subjective Experiences Paper • 2601.07251 • Published Jan 12 • 11
World Craft: Agentic Framework to Create Visualizable Worlds via Text Paper • 2601.09150 • Published Jan 14 • 19
LongCLI-Bench: A Preliminary Benchmark and Study for Long-horizon Agentic Programming in Command-Line Interfaces Paper • 2602.14337 • Published Feb 15 • 13
PyVision-RL: Forging Open Agentic Vision Models via RL Paper • 2602.20739 • Published 29 days ago • 31
OmniCustom: Sync Audio-Video Customization Via Joint Audio-Video Generation Model Paper • 2602.12304 • Published Feb 12