NEO1_0 Collection From Pixels to Words -- Towards Native Vision-Language Primitives at Scale • 7 items • Updated about 13 hours ago • 7
Running Featured 154 DINOv3 Web 🦖 154 Visualize rich, dense image features locally in your browser
The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding Paper • 2512.19693 • Published Dec 22, 2025 • 64
Next-Embedding Prediction Makes Strong Vision Learners Paper • 2512.16922 • Published Dec 18, 2025 • 84
Fast and Accurate Causal Parallel Decoding using Jacobi Forcing Paper • 2512.14681 • Published Dec 16, 2025 • 41
LongVie 2: Multimodal Controllable Ultra-Long Video World Model Paper • 2512.13604 • Published Dec 15, 2025 • 74
sensenova/SenseNova-SI-1.1-Qwen2.5-VL-7B Image-Text-to-Text • 8B • Updated Dec 9, 2025 • 1.76k • 4
sensenova/SenseNova-SI-1.1-Qwen2.5-VL-3B Image-Text-to-Text • 4B • Updated Dec 9, 2025 • 1.82k • 4
sensenova/SenseNova-SI-1.2-InternVL3-8B Image-Text-to-Text • 8B • Updated Dec 10, 2025 • 4.23k • 10
sensenova/SenseNova-SI-1.1-Qwen3-VL-8B Image-Text-to-Text • 9B • Updated Dec 9, 2025 • 2.03k • 5