What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion Paper • 2605.07915 • Published 9 days ago • 8
What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion Paper • 2605.07915 • Published 9 days ago • 8
What Matters for Diffusion-Friendly Latent Manifold? Prior-Aligned Autoencoders for Latent Diffusion Paper • 2605.07915 • Published 9 days ago • 8
MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration Paper • 2408.10605 • Published Aug 20, 2024 • 2
TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning Paper • 2410.19702 • Published Oct 25, 2024 • 1
VideoChat-A1: Thinking with Long Videos by Chain-of-Shot Reasoning Paper • 2506.06097 • Published Jun 6, 2025 • 1
LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents Paper • 2503.10200 • Published Mar 13, 2025
VideoChat-R1.5: Visual Test-Time Scaling to Reinforce Multimodal Reasoning by Iterative Perception Paper • 2509.21100 • Published Sep 25, 2025 • 1
UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation Paper • 2510.10575 • Published Oct 12, 2025 • 2
Beyond Textual CoT: Interleaved Text-Image Chains with Deep Confidence Reasoning for Image Editing Paper • 2510.08157 • Published Oct 9, 2025
VideoChat-M1: Collaborative Policy Planning for Video Understanding via Multi-Agent Reinforcement Learning Paper • 2511.19524 • Published Nov 24, 2025
Continuous-Time Distribution Matching for Few-Step Diffusion Distillation Paper • 2605.06376 • Published 10 days ago • 25
UniFlow: A Unified Pixel Flow Tokenizer for Visual Understanding and Generation Paper • 2510.10575 • Published Oct 12, 2025 • 2