Update README.md

da65278 verified about 1 month ago

4.03 kB

	---
	license: apache-2.0
	tags:
	- pytorch
	---

	<a id="top"></a>
	<div align="center">
	<h1>Gaming for Boundary: Elastic Localization for Frame-Supervised Video Moment Retrieval</h1>

	<p>
	<b>Hao Liu</b><sup>1</sup>
	<b>Yupeng Hu</b><sup>1✉</sup>
	<b>Kun Wang</b><sup>1</sup>
	<b>Yinwei Wei</b><sup>1</sup>
	<b>Liqiang Nie</b><sup>2</sup>
	</p>

	<p>
	<sup>1</sup>School of Software, Shandong University, Jinan, China<br>
	<sup>2</sup>School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China
	</p>
	</div>

	This is the official PyTorch implementation of GOAL, a frame-supervised Video Moment Retrieval (VMR) framework for elastic boundary localization via a game-based paradigm and Dynamic Updating Technique (DUT).

	🔗 Paper: [SIGIR 2025](https://doi.org/10.1145/3726302.3729984)
	🔗 GitHub Repository: [iLearn-Lab/SIGIR25-GOAL](https://github.com/iLearn-Lab/SIGIR25-GOAL)

	---

	## Model Information

	### 1. Model Name
	GOAL (Gaming fOr elAstic Localization).

	### 2. Task Type & Applicable Tasks
	- Task Type: Frame-Supervised Video Moment Retrieval (VMR) / Temporal Localization / Vision-Language Learning
	- Applicable Tasks: Retrieving the temporal moment in a video that matches a natural language query using a single annotated frame, with a focus on ambiguous temporal boundary localization.

	### 3. Project Introduction
	Frame-supervised Video Moment Retrieval (VMR) aims to retrieve the temporal moment in a video that matches a natural language query using only a single annotated frame. While this setting reduces annotation cost, it brings severe ambiguity in temporal boundary prediction.

	GOAL addresses this challenge through a game-based paradigm with three players, namely KFP, AFP, and BP, together with a Dynamic Updating Technique (DUT) that progressively refines boundary decisions through unilateral and bilateral updates for more elastic localization.

	### 4. Training Data Source
	The model is trained and evaluated on standard frame-supervised VMR benchmarks:
	- ActivityNet Captions
	- Charades-STA
	- TACoS

	---

	## Usage & Basic Inference

	This codebase provides training and evaluation scripts for frame-supervised VMR, as well as checkpoints for quick reproduction.

	### Step 1: Prepare the Environment
	Clone the GitHub repository and install dependencies:
	```bash
	git clone https://github.com/iLearn-Lab/SIGIR25-GOAL.git
	cd GOAL
	python -m venv .venv
	source .venv/bin/activate # Linux / Mac
	# .venv\Scripts\activate # Windows
	pip install numpy scipy pyyaml tqdm
	```

	### Step 2: Download Model Weights & Data
	Prepare features and raw annotations following [ViGA](https://github.com/r-cui/ViGA)'s dataset preparation protocol.

	Before running the code, please check and replace local dataset and feature paths in:
	- `src/config.yaml`
	- `src/utils/utils.py`


	### Step 3: Run Inference

	To evaluate a trained experiment folder, run:
	```bash
	python -m src.experiment.eval --exp path/to/your/experiment_folder
	```

	---

	## Limitations & Notes

	Disclaimer: This repository is intended for academic research purposes only.
	- The model requires access to the original benchmark datasets and extracted video features for evaluation.
	- Some configuration files currently contain local path settings and should be updated before use.

	---

	## Citation

	If you find our work useful in your research, please consider citing our paper:

	```bibtex
	@inproceedings{liu2025gaming,
	title={Gaming for Boundary: Elastic Localization for Frame-Supervised Video Moment Retrieval},
	author={Liu, Hao and Hu, Yupeng and Wang, Kun and Wei, Yinwei and Nie, Liqiang},
	booktitle={Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval},
	year={2025},
	doi={10.1145/3726302.3729984}
	}
	```
	---
	## Contact
	If you have any questions, feel free to contact me at liuh90210@gmail.com.