Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want Paper • 2403.20271 • Published Mar 29, 2024 • 3
LLM as Dataset Analyst: Subpopulation Structure Discovery with Large Language Model Paper • 2405.02363 • Published May 3, 2024
MC-LLaVA: Multi-Concept Personalized Vision-Language Model Paper • 2411.11706 • Published Nov 18, 2024 • 1
RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation Paper • 2412.13877 • Published Dec 18, 2024
Efficient Deweather Mixture-of-Experts with Uncertainty-aware Feature-wise Linear Modulation Paper • 2312.16610 • Published Dec 27, 2023
WoW: Towards a World omniscient World model Through Embodied Interaction Paper • 2509.22642 • Published Sep 26, 2025 • 15
Robobench: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models as Embodied Brain Paper • 2510.17801 • Published Oct 20, 2025 • 2
UniCTokens: Boosting Personalized Understanding and Generation via Unified Concept Tokens Paper • 2505.14671 • Published May 20, 2025
Wow, wo, val! A Comprehensive Embodied World Model Evaluation Turing Test Paper • 2601.04137 • Published Jan 7
Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models Paper • 2603.15618 • Published 1 day ago • 7
Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models Paper • 2603.15618 • Published 1 day ago • 7 • 1
Look Before Acting: Enhancing Vision Foundation Representations for Vision-Language-Action Models Paper • 2603.15618 • Published 1 day ago • 7
Robobench: A Comprehensive Evaluation Benchmark for Multimodal Large Language Models as Embodied Brain Paper • 2510.17801 • Published Oct 20, 2025 • 2
Are Video Models Ready as Zero-Shot Reasoners? An Empirical Study with the MME-CoF Benchmark Paper • 2510.26802 • Published Oct 30, 2025 • 34