Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling Paper • 2604.28185 • Published 7 days ago • 86
GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents Paper • 2604.26752 • Published 8 days ago • 97
World-R1: Reinforcing 3D Constraints for Text-to-Video Generation Paper • 2604.24764 • Published 10 days ago • 116
ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents Paper • 2604.23781 • Published 11 days ago • 33
MultiWorld: Scalable Multi-Agent Multi-View Video World Models Paper • 2604.18564 • Published 17 days ago • 45
Elucidating the SNR-t Bias of Diffusion Probabilistic Models Paper • 2604.16044 • Published 20 days ago • 74
ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents Paper • 2604.11784 • Published 24 days ago • 143
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation Paper • 2604.11804 • Published 24 days ago • 71
Gen-Searcher: Reinforcing Agentic Search for Image Generation Paper • 2603.28767 • Published Mar 30 • 58
Lost in Stories: Consistency Bugs in Long Story Generation by LLMs Paper • 2603.05890 • Published Mar 6 • 93
HiAR: Efficient Autoregressive Long Video Generation via Hierarchical Denoising Paper • 2603.08703 • Published Mar 9 • 32
VisGym: Diverse, Customizable, Scalable Environments for Multimodal Agents Paper • 2601.16973 • Published Jan 23 • 40
DanQing: An Up-to-Date Large-Scale Chinese Vision-Language Pre-training Dataset Paper • 2601.10305 • Published Jan 15 • 36
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization Paper • 2601.05432 • Published Jan 8 • 170
NitroGen: An Open Foundation Model for Generalist Gaming Agents Paper • 2601.02427 • Published Jan 4 • 46