Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published 10 days ago • 68
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale Paper • 2510.14979 • Published Oct 16, 2025 • 69
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published Oct 9, 2025 • 127
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published Oct 9, 2025 • 127
Unified Lexical Representation for Interpretable Visual-Language Alignment Paper • 2407.17827 • Published Jul 25, 2024 • 1
DeepMesh: Auto-Regressive Artist-mesh Creation with Reinforcement Learning Paper • 2503.15265 • Published Mar 19, 2025 • 46