InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25, 2025 • 220
Video Analysis and Generation via a Semantic Progress Function Paper • 2604.22554 • Published 13 days ago • 63
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published 10 days ago • 68
Denoising, Fast and Slow: Difficulty-Aware Adaptive Sampling for Image Generation Paper • 2604.19141 • Published 16 days ago • 1
LLaDA2.0-Uni: Unifying Multimodal Understanding and Generation with Diffusion Large Language Model Paper • 2604.20796 • Published 15 days ago • 239
DiffEM: Learning from Corrupted Data with Diffusion Models via Expectation Maximization Paper • 2510.12691 • Published Dec 20, 2025 • 1
OmniWeaving: Towards Unified Video Generation with Free-form Composition and Reasoning Paper • 2603.24458 • Published Mar 25 • 9
The Latent Space: Foundation, Evolution, Mechanism, Ability, and Outlook Paper • 2604.02029 • Published Apr 2 • 147