Building AI Coding Agents for the Terminal: Scaffolding, Harness, Context Engineering, and Lessons Learned Paper • 2603.05344 • Published 5 days ago • 4
OfficeQA Pro: An Enterprise Benchmark for End-to-End Grounded Reasoning Paper • 2603.08655 • Published 1 day ago • 2
CARE-Edit: Condition-Aware Routing of Experts for Contextual Image Editing Paper • 2603.08589 • Published 1 day ago • 30
CoVe: Training Interactive Tool-Use Agents via Constraint-Guided Verification Paper • 2603.01940 • Published 8 days ago • 23
VGGT-Det: Mining VGGT Internal Priors for Sensor-Geometry-Free Multi-View Indoor 3D Object Detection Paper • 2603.00912 • Published 10 days ago • 35
CHIMERA: Compact Synthetic Data for Generalizable LLM Reasoning Paper • 2603.00889 • Published 10 days ago • 48
RubricBench: Aligning Model-Generated Rubrics with Human Standards Paper • 2603.01562 • Published 9 days ago • 56
The Trinity of Consistency as a Defining Principle for General World Models Paper • 2602.23152 • Published 12 days ago • 196
Improving Multi-task Learning via Seeking Task-based Flat Regions Paper • 2211.13723 • Published Nov 24, 2022 • 3
The Invisible Leash: Why RLVR May Not Escape Its Origin Paper • 2507.14843 • Published Jul 20, 2025 • 85
"PhyWorldBench": A Comprehensive Evaluation of Physical Realism in Text-to-Video Models Paper • 2507.13428 • Published Jul 17, 2025 • 16
Does More Inference-Time Compute Really Help Robustness? Paper • 2507.15974 • Published Jul 21, 2025 • 7
PrefPalette: Personalized Preference Modeling with Latent Attributes Paper • 2507.13541 • Published Jul 17, 2025 • 8
ForCenNet: Foreground-Centric Network for Document Image Rectification Paper • 2507.19804 • Published Jul 26, 2025 • 12
Rep-MTL: Unleashing the Power of Representation-level Task Saliency for Multi-Task Learning Paper • 2507.21049 • Published Jul 28, 2025 • 41
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers Paper • 2504.20752 • Published Apr 29, 2025 • 94