AgenticSTS: A Bounded-Memory Testbed for Long-Horizon LLM Agents Paper • 2607.02255 • Published 2 days ago • 39
AgenticSTS: A Bounded-Memory Testbed for Long-Horizon LLM Agents Paper • 2607.02255 • Published 2 days ago • 39
JAMER: Project-Level Code Framework Dataset and Benchmark on Professional Game Engines Paper • 2606.19830 • Published 16 days ago • 3
WorldMark: A Unified Benchmark Suite for Interactive Video World Models Paper • 2604.21686 • Published Apr 23 • 36
WorldMark: A Unified Benchmark Suite for Interactive Video World Models Paper • 2604.21686 • Published Apr 23 • 36
PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference Paper • 2603.25730 • Published Mar 26 • 53
PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference Paper • 2603.25730 • Published Mar 26 • 53
PackForcing: Short Video Training Suffices for Long Video Sampling and Long Context Inference Paper • 2603.25730 • Published Mar 26 • 53
MSA: Memory Sparse Attention for Efficient End-to-End Memory Model Scaling to 100M Tokens Paper • 2603.23516 • Published Mar 6 • 53
AI Idea Bench 2025: AI Research Idea Generation Benchmark Paper • 2504.14191 • Published Apr 19, 2025
MDK12-Bench: A Comprehensive Evaluation of Multimodal Large Language Models on Multidisciplinary Exams Paper • 2508.06851 • Published Aug 9, 2025
InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles Paper • 2508.16072 • Published Aug 22, 2025 • 4
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25, 2025 • 224
Symbolic Graphics Programming with Large Language Models Paper • 2509.05208 • Published Sep 5, 2025 • 47
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling Paper • 2509.12201 • Published Sep 15, 2025 • 107