SIGIR25-GOAL / README.md
LIU1712's picture
Update README.md
da65278 verified
---
license: apache-2.0
tags:
- pytorch
---
<a id="top"></a>
<div align="center">
<h1>Gaming for Boundary: Elastic Localization for Frame-Supervised Video Moment Retrieval</h1>
<p>
<b>Hao Liu</b><sup>1</sup>&nbsp;
<b>Yupeng Hu</b><sup>1βœ‰</sup>&nbsp;
<b>Kun Wang</b><sup>1</sup>&nbsp;
<b>Yinwei Wei</b><sup>1</sup>&nbsp;
<b>Liqiang Nie</b><sup>2</sup>
</p>
<p>
<sup>1</sup>School of Software, Shandong University, Jinan, China<br>
<sup>2</sup>School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China
</p>
</div>
This is the official PyTorch implementation of **GOAL**, a frame-supervised Video Moment Retrieval (VMR) framework for elastic boundary localization via a game-based paradigm and Dynamic Updating Technique (DUT).
πŸ”— **Paper:** [SIGIR 2025](https://doi.org/10.1145/3726302.3729984)
πŸ”— **GitHub Repository:** [iLearn-Lab/SIGIR25-GOAL](https://github.com/iLearn-Lab/SIGIR25-GOAL)
---
## Model Information
### 1. Model Name
**GOAL** (**G**aming f**O**r el**A**stic **L**ocalization).
### 2. Task Type & Applicable Tasks
- **Task Type:** Frame-Supervised Video Moment Retrieval (VMR) / Temporal Localization / Vision-Language Learning
- **Applicable Tasks:** Retrieving the temporal moment in a video that matches a natural language query using a single annotated frame, with a focus on ambiguous temporal boundary localization.
### 3. Project Introduction
Frame-supervised Video Moment Retrieval (VMR) aims to retrieve the temporal moment in a video that matches a natural language query using only a single annotated frame. While this setting reduces annotation cost, it brings severe ambiguity in temporal boundary prediction.
**GOAL** addresses this challenge through a **game-based paradigm** with three players, namely **KFP**, **AFP**, and **BP**, together with a **Dynamic Updating Technique (DUT)** that progressively refines boundary decisions through unilateral and bilateral updates for more elastic localization.
### 4. Training Data Source
The model is trained and evaluated on standard frame-supervised VMR benchmarks:
- **ActivityNet Captions**
- **Charades-STA**
- **TACoS**
---
## Usage & Basic Inference
This codebase provides training and evaluation scripts for frame-supervised VMR, as well as checkpoints for quick reproduction.
### Step 1: Prepare the Environment
Clone the GitHub repository and install dependencies:
```bash
git clone https://github.com/iLearn-Lab/SIGIR25-GOAL.git
cd GOAL
python -m venv .venv
source .venv/bin/activate # Linux / Mac
# .venv\Scripts\activate # Windows
pip install numpy scipy pyyaml tqdm
```
### Step 2: Download Model Weights & Data
Prepare features and raw annotations following [ViGA](https://github.com/r-cui/ViGA)'s dataset preparation protocol.
Before running the code, please check and replace local dataset and feature paths in:
- `src/config.yaml`
- `src/utils/utils.py`
### Step 3: Run Inference
To evaluate a trained experiment folder, run:
```bash
python -m src.experiment.eval --exp path/to/your/experiment_folder
```
---
## Limitations & Notes
**Disclaimer:** This repository is intended for **academic research purposes only**.
- The model requires access to the original benchmark datasets and extracted video features for evaluation.
- Some configuration files currently contain local path settings and should be updated before use.
---
## Citation
If you find our work useful in your research, please consider citing our paper:
```bibtex
@inproceedings{liu2025gaming,
title={Gaming for Boundary: Elastic Localization for Frame-Supervised Video Moment Retrieval},
author={Liu, Hao and Hu, Yupeng and Wang, Kun and Wei, Yinwei and Nie, Liqiang},
booktitle={Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval},
year={2025},
doi={10.1145/3726302.3729984}
}
```
---
## Contact
**If you have any questions, feel free to contact me at liuh90210@gmail.com**.