Atlas: Orchestrating Heterogeneous Models and Tools for Multi-Domain Complex Reasoning Paper • 2601.03872 • Published 19 days ago • 42
LTX-2: Efficient Joint Audio-Visual Foundation Model Paper • 2601.03233 • Published 20 days ago • 135
Yume-1.5: A Text-Controlled Interactive World Generation Model Paper • 2512.22096 • Published about 1 month ago • 60
LLaDA2.0: Scaling Up Diffusion Language Models to 100B Paper • 2512.15745 • Published Dec 10, 2025 • 80
DeContext as Defense: Safe Image Editing in Diffusion Transformers Paper • 2512.16625 • Published Dec 18, 2025 • 25
IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning Paper • 2512.15635 • Published Dec 17, 2025 • 20
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length Paper • 2512.04677 • Published Dec 4, 2025 • 170
Back to Basics: Let Denoising Generative Models Denoise Paper • 2511.13720 • Published Nov 17, 2025 • 69
WAON Collection WAON: Large-Scale and High-Quality Japanese Image-Text Pair Dataset for Vision-Language Models • 4 items • Updated Oct 28, 2025 • 1
WAON: Large-Scale and High-Quality Japanese Image-Text Pair Dataset for Vision-Language Models Paper • 2510.22276 • Published Oct 25, 2025 • 3
UltraGen: High-Resolution Video Generation with Hierarchical Attention Paper • 2510.18775 • Published Oct 21, 2025 • 18
Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset Paper • 2510.15742 • Published Oct 17, 2025 • 51
RAE Collection Collection for Diffusion Transformers with Representation Autoencoders • 1 item • Updated Oct 14, 2025 • 10