How and What to Imagine? Visual Thinking in Unified Multimodal Models for Cross-View Spatial Reasoning Paper • 2605.27310 • Published 4 days ago • 16
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding Paper • 2512.19693 • Published Dec 22, 2025 • 68
Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing Paper • 2512.17909 • Published Dec 19, 2025 • 37
Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models Paper • 2505.17015 • Published May 22, 2025 • 9