Papers
arxiv:2506.05782

GazeNLQ @ Ego4D Natural Language Queries Challenge 2025

Published on Jun 6, 2025
Authors:
,
,
,

Abstract

A novel approach called GazeNLQ is presented that uses gaze estimation to improve video segment retrieval based on natural language queries, achieving strong performance on the Ego4D NLQ challenge.

This report presents our solution to the Ego4D Natural Language Queries (NLQ) Challenge at CVPR 2025. Egocentric video captures the scene from the wearer's perspective, where gaze serves as a key non-verbal communication cue that reflects visual attention and offer insights into human intention and cognition. Motivated by this, we propose a novel approach, GazeNLQ, which leverages gaze to retrieve video segments that match given natural language queries. Specifically, we introduce a contrastive learning-based pretraining strategy for gaze estimation directly from video. The estimated gaze is used to augment video representations within proposed model, thereby enhancing localization accuracy. Experimental results show that GazeNLQ achieves R1@IoU0.3 and R1@IoU0.5 scores of 27.82 and 18.68, respectively. Our code is available at https://github.com/stevenlin510/GazeNLQ.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2506.05782
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.05782 in a model README.md to link it from this page.

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.05782 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.