Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders Paper • 2603.19209 • Published 7 days ago • 4
V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning Paper • 2603.14482 • Published 11 days ago • 22
Omnilingual MT: Machine Translation for 1,600 Languages Paper • 2603.16309 • Published 10 days ago • 19
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections Paper • 2603.12180 • Published 14 days ago • 63
VidEoMT: Your ViT is Secretly Also a Video Segmentation Model Paper • 2602.17807 • Published Feb 19 • 6
Causal-JEPA: Learning World Models through Object-Level Latent Interventions Paper • 2602.11389 • Published Feb 11 • 7
UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders Paper • 2601.17950 • Published Jan 25 • 4
TCAndon-Router: Adaptive Reasoning Router for Multi-Agent Collaboration Paper • 2601.04544 • Published Jan 8 • 6
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion Paper • 2512.19535 • Published Dec 22, 2025 • 12
Economies of Open Intelligence: Tracing Power & Participation in the Model Ecosystem Paper • 2512.03073 • Published Nov 27, 2025 • 7
The German Commons - 154 Billion Tokens of Openly Licensed Text for German Language Models Paper • 2510.13996 • Published Oct 15, 2025 • 9