Confucius Code Agent: An Open-sourced AI Software Engineer at Industrial Scale Paper • 2512.10398 • Published about 23 hours ago • 2
Long-horizon Reasoning Agent for Olympiad-Level Mathematical Problem Solving Paper • 2512.10739 • Published about 16 hours ago • 28
RefineBench: Evaluating Refinement Capability of Language Models via Checklists Paper • 2511.22173 • Published 15 days ago • 12
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research Paper • 2511.19399 • Published 18 days ago • 55
OmniScientist: Toward a Co-evolving Ecosystem of Human and AI Scientists Paper • 2511.16931 • Published 21 days ago • 6
WorldGen: From Text to Traversable and Interactive 3D Worlds Paper • 2511.16825 • Published 21 days ago • 21
O-Mem: Omni Memory System for Personalized, Long Horizon, Self-Evolving Agents Paper • 2511.13593 • Published 25 days ago • 24
OpenMMReasoner: Pushing the Frontiers for Multimodal Reasoning with an Open and General Recipe Paper • 2511.16334 • Published 22 days ago • 91
Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs Paper • 2511.16664 • Published 22 days ago • 25
Can World Simulators Reason? Gen-ViRe: A Generative Visual Reasoning Benchmark Paper • 2511.13853 • Published 25 days ago • 34
ResearchRubrics: A Benchmark of Prompts and Rubrics For Evaluating Deep Research Agents Paper • 2511.07685 • Published Nov 10 • 9
PAN: A World Model for General, Interactable, and Long-Horizon World Simulation Paper • 2511.09057 • Published about 1 month ago • 75
Adaptive Multi-Agent Response Refinement in Conversational Systems Paper • 2511.08319 • Published about 1 month ago • 40
Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at Scale Paper • 2511.05705 • Published Nov 7 • 6