VLMs Need Words: Vision Language Models Ignore Visual Detail In Favor of Semantic Anchors Paper • 2604.02486 • Published Apr 2 • 12
DA-Flow: Degradation-Aware Optical Flow Estimation with Diffusion Models Paper • 2603.23499 • Published Mar 24 • 51
Grounding World Simulation Models in a Real-World Metropolis Paper • 2603.15583 • Published Mar 16 • 155
TAG:Tangential Amplifying Guidance for Hallucination-Resistant Diffusion Sampling Paper • 2510.04533 • Published Oct 6, 2025 • 48
MATRIX: Mask Track Alignment for Interaction-aware Video Generation Paper • 2510.07310 • Published Oct 8, 2025 • 36
Visual Representation Alignment for Multimodal Large Language Models Paper • 2509.07979 • Published Sep 9, 2025 • 84
Fine-Grained Perturbation Guidance via Attention Head Selection Paper • 2506.10978 • Published Jun 12, 2025 • 25
DreamMatcher: Appearance Matching Self-Attention for Semantically-Consistent Text-to-Image Personalization Paper • 2402.09812 • Published Feb 15, 2024 • 16