GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper • 2601.05242 • Published 29 days ago • 221
Mind-Brush: Integrating Agentic Cognitive Search and Reasoning into Image Generation Paper • 2602.01756 • Published 5 days ago • 22
3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation Paper • 2602.03796 • Published 3 days ago • 49
PaperBanana: Automating Academic Illustration for AI Scientists Paper • 2601.23265 • Published 7 days ago • 132
DynamicVLA: A Vision-Language-Action Model for Dynamic Object Manipulation Paper • 2601.22153 • Published 8 days ago • 68
view article Article Transformers v5: Simple model definitions powering the AI ecosystem +2 Dec 1, 2025 • 291
Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation Paper • 2511.14993 • Published Nov 19, 2025 • 231
view article Article Introducing smolagents: simple agents that write actions in code. +1 Dec 31, 2024 • 1.17k
VLA-Adapter: An Effective Paradigm for Tiny-Scale Vision-Language-Action Model Paper • 2509.09372 • Published Sep 11, 2025 • 246
Paper2Video: Automatic Video Generation from Scientific Papers Paper • 2510.05096 • Published Oct 6, 2025 • 119
Voost: A Unified and Scalable Diffusion Transformer for Bidirectional Virtual Try-On and Try-Off Paper • 2508.04825 • Published Aug 6, 2025 • 60
MOSPA: Human Motion Generation Driven by Spatial Audio Paper • 2507.11949 • Published Jul 16, 2025 • 25
Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models Paper • 2507.13344 • Published Jul 17, 2025 • 59
EmbRACE-3K: Embodied Reasoning and Action in Complex Environments Paper • 2507.10548 • Published Jul 14, 2025 • 37