BRDFusion: Physics Meets Generation for Urban Scene Inverse Rendering Paper • 2606.17049 • Published 10 days ago • 27
LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV Paper • 2605.26244 • Published about 1 month ago • 38
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published 30 days ago • 144
SkillOpt: Executive Strategy for Self-Evolving Agent Skills Paper • 2605.23904 • Published May 22 • 246
MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models Paper • 2605.14906 • Published May 14 • 79 • 5
MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models Paper • 2605.14906 • Published May 14 • 79
ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration Paper • 2605.03042 • Published May 4 • 139 • 10
ARIS: Autonomous Research via Adversarial Multi-Agent Collaboration Paper • 2605.03042 • Published May 4 • 139