Beyond Static Leaderboards: Predictive Validity for the Evaluation of LLM Agents Paper • 2606.19704 • Published 7 days ago • 39
Moebius: 0.2B Lightweight Image Inpainting Framework with 10B-Level Performance Paper • 2606.19195 • Published 8 days ago • 132
LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling Paper • 2606.18023 • Published 9 days ago • 203
view article Article Welcome NVIDIA Cosmos 3: The First Open Omni-model for Physical AI Reasoning and Action nvidia • 24 days ago • 83
Running on Zero Agents Featured 57 Cosmos3-Nano 🌌 57 NVIDIA Cosmos3-Nano — text/image to video + audio
Toward Generalist Autonomous Research via Hypothesis-Tree Refinement Paper • 2606.11926 • Published 15 days ago • 117
AnchorWorld: Embodied Egocentric World Simulation with View-based Evolution Customization Paper • 2606.07326 • Published 20 days ago • 29
SoCRATES: Towards Reliable Automated Evaluation of Proactive LLM Mediation across Domains and Socio-cognitive Variations Paper • 2606.05563 • Published 21 days ago • 53