HumanNet: Scaling Human-centric Video Learning to One Million Hours Paper • 2605.06747 • Published 7 days ago • 48
Seeing Fast and Slow: Learning the Flow of Time in Videos Paper • 2604.21931 • Published 21 days ago • 19
view article Article Nucleus-Image: Scaling Text-to-Image with Sparse Mixture of Experts NucleusAI • about 1 month ago • 11
OmniShow: Unifying Multimodal Conditions for Human-Object Interaction Video Generation Paper • 2604.11804 • Published Apr 13 • 72
view article Article `LeRobotDataset:v3.0`: Bringing large-scale datasets to `lerobot` +9 fracapuano, aractingi, lhoestq, CarolinePascal, pepijn223, jadechoghari, cadene, aliberts, AdilZtn, nepyope, imstevenpmwork • Sep 16, 2025 • 54
SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning Paper • 2603.23483 • Published Mar 24 • 62
Manifold-Aware Exploration for Reinforcement Learning in Video Generation Paper • 2603.21872 • Published Mar 23 • 33
Astrolabe: Steering Forward-Process Reinforcement Learning for Distilled Autoregressive Video Models Paper • 2603.17051 • Published Mar 17 • 109
view article Article Introducing Modular Diffusers - Composable Building Blocks for Diffusion Pipelines +2 YiYiXu, OzzyGT, dn6, sayakpaul • Mar 5 • 51
view article Article Make your ZeroGPU Spaces go brrr with ahead-of-time compilation +2 cbensimon, sayakpaul, linoyts, multimodalart • Sep 2, 2025 • 77
HiFi-Inpaint: Towards High-Fidelity Reference-Based Inpainting for Generating Detail-Preserving Human-Product Images Paper • 2603.02210 • Published Mar 2 • 29
view article Article ZeroGPU Spaces 加速实践:PyTorch Ahead-of-Time Compilation 全解析 +2 cbensimon, sayakpaul, linoyts, multimodalart • Sep 2, 2025 • 8
Helios Collection Helios: 14B Real-Time Long Video Generation Model can be Cheaper, Faster but Keep Stronger than 1.3B ones • 7 items • Updated Mar 15 • 24
FSVideo: Fast Speed Video Diffusion Model in a Highly-Compressed Latent Space Paper • 2602.02092 • Published Feb 2 • 18