ClawMark: A Living-World Benchmark for Multi-Turn, Multi-Day, Multimodal Coworker Agents Paper • 2604.23781 • Published Apr 26 • 33
SketchVLM: Vision language models can annotate images to explain thoughts and guide users Paper • 2604.22875 • Published Apr 23 • 35
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation Paper • 2604.24763 • Published Apr 27 • 71
Why Fine-Tuning Encourages Hallucinations and How to Fix It Paper • 2604.15574 • Published Apr 16 • 25
Rewarding the Scientific Process: Process-Level Reward Modeling for Agentic Data Analysis Paper • 2604.24198 • Published Apr 27 • 22
UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models Paper • 2604.17565 • Published Apr 19 • 10
For-Value: Efficient Forward-Only Data Valuation for finetuning LLMs and VLMs Paper • 2508.10180 • Published Apr 25 • 18
Efficient Agent Evaluation via Diversity-Guided User Simulation Paper • 2604.21480 • Published Apr 23 • 15
Taming Actor-Observer Asymmetry in Agents via Dialectical Alignment Paper • 2604.19548 • Published Apr 21 • 16
Zero-to-CAD: Agentic Synthesis of Interpretable CAD Programs at Million-Scale Without Real Data Paper • 2604.24479 • Published Apr 27 • 9
PageGuide: Browser extension to assist users in navigating a webpage and locating information Paper • 2604.23772 • Published Apr 26 • 7
Learning to Identify Out-of-Distribution Objects for 3D LiDAR Anomaly Segmentation Paper • 2604.23604 • Published Apr 26 • 6
ATTN-FIQA: Interpretable Attention-based Face Image Quality Assessment with Vision Transformers Paper • 2604.22841 • Published Apr 21 • 5
RaV-IDP: A Reconstruction-as-Validation Framework for Faithful Intelligent Document Processing Paper • 2604.23644 • Published Apr 26 • 5
Discovering Agentic Safety Specifications from 1-Bit Danger Signals Paper • 2604.23210 • Published Apr 25 • 4
EX-FIQA: Leveraging Intermediate Early eXit Representations from Vision Transformers for Face Image Quality Assessment Paper • 2604.22842 • Published Apr 21 • 3
Towards Understanding the Robustness of Sparse Autoencoders Paper • 2604.18756 • Published Apr 20 • 10
BARRED: Synthetic Training of Custom Policy Guardrails via Asymmetric Debate Paper • 2604.25203 • Published Apr 28 • 8