iTryOn: Mastering Interactive Video Virtual Try-On with Spatial-Semantic Guidance Paper • 2605.21431 • Published 3 days ago • 2
Video2GUI: Synthesizing Large-Scale Interaction Trajectories for Generalized GUI Agent Pretraining Paper • 2605.14747 • Published 9 days ago • 142
Multi-Objective and Mixed-Reward Reinforcement Learning via Reward-Decorrelated Policy Optimization Paper • 2605.13641 • Published 10 days ago • 48
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 16 days ago • 186
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents Paper • 2605.05185 • Published 17 days ago • 98
Heterogeneous Scientific Foundation Model Collaboration Paper • 2604.27351 • Published 23 days ago • 217
TexOCR: Advancing Document OCR Models for Compilable Page-to-LaTeX Reconstruction Paper • 2604.22880 • Published 29 days ago • 9
ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces Paper • 2604.05172 • Published Apr 6 • 24
Adam's Law: Textual Frequency Law on Large Language Models Paper • 2604.02176 • Published Apr 2 • 503
When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models Paper • 2604.08546 • Published Apr 9 • 115
Efficient and Principled Scientific Discovery through Bayesian Optimization: A Tutorial Paper • 2604.01328 • Published Apr 1 • 9