Direct 3D-Aware Object Insertion via Decomposed Visual Proxies Paper • 2606.06601 • Published 22 days ago • 26
Direct 3D-Aware Object Insertion via Decomposed Visual Proxies Paper • 2606.06601 • Published 22 days ago • 26
From Pixels to Words -- Towards Native One-Vision Models at Scale Paper • 2605.28820 • Published about 1 month ago • 75
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published Apr 27 • 71
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Paper • 2510.14979 • Published Oct 16, 2025 • 70
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published Oct 9, 2025 • 128
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published Oct 9, 2025 • 128
Unified Lexical Representation for Interpretable Visual-Language Alignment Paper • 2407.17827 • Published Jul 25, 2024 • 1