Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding Paper • 2409.03757 • Published Sep 5, 2024 • 3
RandAR: Decoder-only Autoregressive Visual Generation in Random Orders Paper • 2412.01827 • Published Dec 2, 2024
PaintScene4D: Consistent 4D Scene Generation from Text Prompts Paper • 2412.04471 • Published Dec 5, 2024
AgMMU: A Comprehensive Agricultural Multimodal Understanding and Reasoning Benchmark Paper • 2504.10568 • Published Apr 14, 2025
Argus: Vision-Centric Reasoning with Grounded Chain-of-Thought Paper • 2505.23766 • Published May 29, 2025
PPTArena: A Benchmark for Agentic PowerPoint Editing Paper • 2512.03042 • Published Dec 2, 2025 • 1
LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight Paper • 2511.20648 • Published Nov 25, 2025 • 1
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning Paper • 2601.09708 • Published Jan 14 • 56
OSGym: Scalable Distributed Data Engine for Generalizable Computer Agents Paper • 2511.11672 • Published Nov 11, 2025
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published about 1 month ago • 144
LocateAnything: Fast and High-Quality Vision-Language Grounding with Parallel Box Decoding Paper • 2605.27365 • Published about 1 month ago • 144
Fast-ThinkAct: Efficient Vision-Language-Action Reasoning via Verbalizable Latent Planning Paper • 2601.09708 • Published Jan 14 • 56
Eagle 2.5: Boosting Long-Context Post-Training for Frontier Vision-Language Models Paper • 2504.15271 • Published Apr 21, 2025 • 69