arxiv:2605.09976

OZ-TAL: Online Zero-Shot Temporal Action Localization

Published on May 11

Authors:

Abstract

Online Zero-shot Temporal Action Localization enables detection of previously unseen actions in streaming videos using vision-language models without additional training.

AI-generated summary

Online Temporal Action Localization (On-TAL) aims to detect the occurrence time and category of actions in untrimmed streaming videos immediately upon their completion. Recent advancements in this field focus on developing more sophisticated frameworks, shifting from Online Action Detection (OAD)-based aggregation paradigm to instance-level understanding. However, existing approaches are typically trained on specific domains and often exhibit limited generalization capabilities when applied to arbitrary videos, particularly in the presence of previously unseen actions. In this paper, we introduce a new task called Online Zero-shot Temporal Action Localization (OZ-TAL), which aims to detect previously unseen actions in an online fashion. Furthermore, we propose a training-free framework that leverages off-the-shelf Vision-Language Models (VLMs) while introducing additional mechanisms to enhance visual representations and mitigate their inherent biases. We establish new benchmarks and representative baselines for OZ-TAL on THUMOS14 and ActivityNet-1.3, and extensive experiments demonstrate that our method substantially outperforms existing state-of-the-art approaches under both offline and online zero-shot settings.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Get this paper in your agent:

hf papers read 2605.09976

Don't have the latest CLI?

curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2605.09976 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2605.09976 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2605.09976 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.