One Model, Many Budgets: Elastic Latent Interfaces for Diffusion Transformers Paper • 2603.12245 • Published about 13 hours ago • 6
WeEdit: A Dataset, Benchmark and Glyph-Guided Framework for Text-centric Image Editing Paper • 2603.11593 • Published 1 day ago • 13
Multi-Task Reinforcement Learning for Enhanced Multimodal LLM-as-a-Judge Paper • 2603.11665 • Published about 23 hours ago • 1
Understanding by Reconstruction: Reversing the Software Development Process for LLM Pretraining Paper • 2603.11103 • Published 2 days ago • 2
Just-in-Time: Training-Free Spatial Acceleration for Diffusion Transformers Paper • 2603.10744 • Published 2 days ago • 5
CLIPO: Contrastive Learning in Policy Optimization Generalizes RLVR Paper • 2603.10101 • Published 3 days ago • 3
Code-Space Response Oracles: Generating Interpretable Multi-Agent Policies with Large Language Models Paper • 2603.10098 • Published 3 days ago • 1
V_{0.5}: Generalist Value Model as a Prior for Sparse RL Rollouts Paper • 2603.10848 • Published 1 day ago • 7
InternVL-U: Democratizing Unified Multimodal Models for Understanding, Reasoning, Generation and Editing Paper • 2603.09877 • Published 3 days ago • 36
MiniAppBench: Evaluating the Shift from Text to Interactive HTML Responses in LLM-Powered Assistants Paper • 2603.09652 • Published 3 days ago • 12
OfficeQA Pro: An Enterprise Benchmark for End-to-End Grounded Reasoning Paper • 2603.08655 • Published 4 days ago • 3
Penguin-VL: Exploring the Efficiency Limits of VLM with LLM-based Vision Encoders Paper • 2603.06569 • Published 7 days ago • 101
Reasoning Models Struggle to Control their Chains of Thought Paper • 2603.05706 • Published 7 days ago • 26