Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published 11 days ago • 68
SenseNova-U1 Collection SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-Unify Architecture • 4 items • Updated about 18 hours ago • 43
MMCORE: MultiModal COnnection with Representation Aligned Latent Embeddings Paper • 2604.19902 • Published 17 days ago • 2 • 2
Beyond Language Modeling: An Exploration of Multimodal Pretraining Paper • 2603.03276 • Published Mar 3 • 104
Has GPT-5 Achieved Spatial Intelligence? An Empirical Study Paper • 2508.13142 • Published Aug 18, 2025 • 34
ConsistCompose: Unified Multimodal Layout Control for Image Composition Paper • 2511.18333 • Published Nov 23, 2025 • 5
ConsistCompose: Unified Multimodal Layout Control for Image Composition Paper • 2511.18333 • Published Nov 23, 2025 • 5 • 1
ConsistCompose: Unified Multimodal Layout Control for Image Composition Paper • 2511.18333 • Published Nov 23, 2025 • 5
Scaling Spatial Intelligence with Multimodal Foundation Models Paper • 2511.13719 • Published Nov 17, 2025 • 49