From Scale to Speed: Adaptive Test-Time Scaling for Image Editing Paper • 2603.00141 • Published 10 days ago • 129
MobilityBench: A Benchmark for Evaluating Route-Planning Agents in Real-World Mobility Scenarios Paper • 2602.22638 • Published 9 days ago • 104
HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation Paper • 2602.18283 • Published 14 days ago • 53
Code2World: A GUI World Model via Renderable Code Generation Paper • 2602.09856 • Published 24 days ago • 198
Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation Paper • 2601.20614 • Published Jan 28 • 120
Urban Socio-Semantic Segmentation with Vision-Language Reasoning Paper • 2601.10477 • Published Jan 15 • 155
Thinking with Map: Reinforced Parallel Map-Augmented Agent for Geolocalization Paper • 2601.05432 • Published Jan 8 • 168
Taming Hallucinations: Boosting MLLMs' Video Understanding via Counterfactual Video Generation Paper • 2512.24271 • Published Dec 30, 2025 • 63
InfiGUI-R1: Advancing Multimodal GUI Agents from Reactive Actors to Deliberative Reasoners Paper • 2504.14239 • Published Apr 19, 2025 • 14
InfiGUI: Advanced Vision-Native Agent for GUI Interaction Collection 7 items • Updated Oct 15, 2025 • 1
Computer-Use Agents as Judges for Generative User Interface Paper • 2511.15567 • Published Nov 19, 2025 • 53
Safe-Sora: Safe Text-to-Video Generation via Graphical Watermarking Paper • 2505.12667 • Published May 19, 2025 • 9
VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation Paper • 2511.02778 • Published Nov 4, 2025 • 102