WebVR: Benchmarking Multimodal LLMs for WebPage Recreation from Videos via Human-Aligned Visual Rubrics Paper • 2603.13391 • Published 21 days ago • 19
olmOCR Collection olmOCR is a document recognition pipeline for efficiently converting documents into plain text. olmocr.allenai.org • 12 items • Updated Dec 23, 2025 • 150
CoCo: Code as CoT for Text-to-Image Preview and Rare Concept Generation Paper • 2603.08652 • Published 22 days ago • 39
Proact-VL: A Proactive VideoLLM for Real-Time AI Companions Paper • 2603.03447 • Published 28 days ago • 37
GEBench: Benchmarking Image Generation Models as GUI Environments Paper • 2602.09007 • Published Feb 9 • 39
How Well Do Models Follow Visual Instructions? VIBE: A Systematic Benchmark for Visual Instruction-Driven Image Editing Paper • 2602.01851 • Published Feb 2 • 16
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning Paper • 2601.05593 • Published Jan 9 • 86
Random Policy Valuation is Enough for LLM Reasoning with Verifiable Rewards Paper • 2509.24981 • Published Sep 29, 2025 • 29
Video-MTR: Reinforced Multi-Turn Reasoning for Long Video Understanding Paper • 2508.20478 • Published Aug 28, 2025 • 18
NextStep-1: Toward Autoregressive Image Generation with Continuous Tokens at Scale Paper • 2508.10711 • Published Aug 14, 2025 • 146