Data Journalist Agent: Transforming Data into Verifiable Multimodal Stories Paper • 2606.11176 • Published 16 days ago • 125
TRL-Bench: Standardizing Cross-Paradigm Representation-Level Evaluation of Tabular Encoders Paper • 2606.09323 • Published 17 days ago • 51
How and What to Imagine? Visual Thinking in Unified Multimodal Models for Cross-View Spatial Reasoning Paper • 2605.27310 • Published about 1 month ago • 20
Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models Paper • 2605.21573 • Published May 20 • 111
Forecasting Downstream Performance of LLMs With Proxy Metrics Paper • 2605.18607 • Published May 18 • 14
RiT: Vanilla Diffusion Transformers Suffice in Representation Space Paper • 2605.21981 • Published May 21 • 10
Do Enterprise Systems Need Learned World Models? The Importance of Context to Infer Dynamics Paper • 2605.12178 • Published May 12 • 65
Sema Code: Decoupling AI Coding Agents into Programmable, Embeddable Infrastructure Paper • 2604.11045 • Published Apr 13 • 26
FORGE:Fine-grained Multimodal Evaluation for Manufacturing Scenarios Paper • 2604.07413 • Published Apr 8 • 97
Communicating about Space: Language-Mediated Spatial Integration Across Partial Views Paper • 2603.27183 • Published Mar 28 • 20
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published Mar 25 • 99
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published Mar 25 • 99 • 5
CUA-Suite: Massive Human-annotated Video Demonstrations for Computer-Use Agents Paper • 2603.24440 • Published Mar 25 • 99