Do VLMs Need Vision Transformers? Evaluating State Space Models as Vision Encoders Paper • 2603.19209 • Published 12 days ago • 5
V-JEPA 2.1: Unlocking Dense Features in Video Self-Supervised Learning Paper • 2603.14482 • Published 16 days ago • 24
Omnilingual MT: Machine Translation for 1,600 Languages Paper • 2603.16309 • Published 14 days ago • 20
Strategic Navigation or Stochastic Search? How Agents and Humans Reason Over Document Collections Paper • 2603.12180 • Published 19 days ago • 64
VidEoMT: Your ViT is Secretly Also a Video Segmentation Model Paper • 2602.17807 • Published Feb 19 • 7
Causal-JEPA: Learning World Models through Object-Level Latent Interventions Paper • 2602.11389 • Published Feb 11 • 7
UPLiFT: Efficient Pixel-Dense Feature Upsampling with Local Attenders Paper • 2601.17950 • Published Jan 25 • 4
TCAndon-Router: Adaptive Reasoning Router for Multi-Agent Collaboration Paper • 2601.04544 • Published Jan 8 • 6
CASA: Cross-Attention via Self-Attention for Efficient Vision-Language Fusion Paper • 2512.19535 • Published Dec 22, 2025 • 12
AIS 2024 Challenge on Video Quality Assessment of User-Generated Content: Methods and Results Paper • 2404.16205 • Published Apr 24, 2024
INRetouch: Context Aware Implicit Neural Representation for Photography Retouching Paper • 2412.03848 • Published Dec 5, 2024
CLIP-Art: Contrastive Pre-training for Fine-Grained Art Classification Paper • 2204.14244 • Published Apr 29, 2022
Model-Based Image Signal Processors via Learnable Dictionaries Paper • 2201.03210 • Published Jan 10, 2022
High-Quality Image Restoration Following Human Instructions Paper • 2401.16468 • Published Jan 29, 2024 • 15
A Brief Overview of AI Governance for Responsible Machine Learning Systems Paper • 2211.13130 • Published Nov 21, 2022
H2O Open Ecosystem for State-of-the-art Large Language Models Paper • 2310.13012 • Published Oct 17, 2023 • 9
NILUT: Conditional Neural Implicit 3D Lookup Tables for Image Enhancement Paper • 2306.11920 • Published Jun 20, 2023 • 3