Papers
arxiv:2503.19240

Beyond Object Categories: Multi-Attribute Reference Understanding for Visual Grounding

Published on Mar 25, 2025
Authors:
,
,
,
,

Abstract

Multi-ref EC framework integrates state descriptions, derived intentions, and embodied gestures for object localization in human-robot interactions, demonstrating superior performance over single-attribute approaches.

AI-generated summary

Referring expression comprehension (REC) aims at achieving object localization based on natural language descriptions. However, existing REC approaches are constrained by object category descriptions and single-attribute intention descriptions, hindering their application in real-world scenarios. In natural human-robot interactions, users often express their desires through individual states and intentions, accompanied by guiding gestures, rather than detailed object descriptions. To address this challenge, we propose Multi-ref EC, a novel task framework that integrates state descriptions, derived intentions, and embodied gestures to locate target objects. We introduce the State-Intention-Gesture Attributes Reference (SIGAR) dataset, which combines state and intention expressions with embodied references. Through extensive experiments with various baseline models on SIGAR, we demonstrate that properly ordered multi-attribute references contribute to improved localization performance, revealing that single-attribute reference is insufficient for natural human-robot interaction scenarios. Our findings underscore the importance of multi-attribute reference expressions in advancing visual-language understanding.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2503.19240 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2503.19240 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2503.19240 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.