File size: 4,028 Bytes

4f20d60
 
da65278
 
4f20d60

---
license: apache-2.0
tags:
- pytorch
---

<a id="top"></a>
<div align="center">
  <h1>Gaming for Boundary: Elastic Localization for Frame-Supervised Video Moment Retrieval</h1>

  <p>
    <b>Hao Liu</b><sup>1</sup>&nbsp;
    <b>Yupeng Hu</b><sup>1✉</sup>&nbsp;
    <b>Kun Wang</b><sup>1</sup>&nbsp;
    <b>Yinwei Wei</b><sup>1</sup>&nbsp;
    <b>Liqiang Nie</b><sup>2</sup>
  </p>

  <p>
    <sup>1</sup>School of Software, Shandong University, Jinan, China<br>
    <sup>2</sup>School of Computer Science and Technology, Harbin Institute of Technology (Shenzhen), Shenzhen, China
  </p>
</div>

This is the official PyTorch implementation of **GOAL**, a frame-supervised Video Moment Retrieval (VMR) framework for elastic boundary localization via a game-based paradigm and Dynamic Updating Technique (DUT).

🔗 **Paper:** [SIGIR 2025](https://doi.org/10.1145/3726302.3729984)
🔗 **GitHub Repository:** [iLearn-Lab/SIGIR25-GOAL](https://github.com/iLearn-Lab/SIGIR25-GOAL)

---

##  Model Information

### 1. Model Name
**GOAL** (**G**aming f**O**r el**A**stic **L**ocalization).

### 2. Task Type & Applicable Tasks
- **Task Type:** Frame-Supervised Video Moment Retrieval (VMR) / Temporal Localization / Vision-Language Learning
- **Applicable Tasks:** Retrieving the temporal moment in a video that matches a natural language query using a single annotated frame, with a focus on ambiguous temporal boundary localization.

### 3. Project Introduction
Frame-supervised Video Moment Retrieval (VMR) aims to retrieve the temporal moment in a video that matches a natural language query using only a single annotated frame. While this setting reduces annotation cost, it brings severe ambiguity in temporal boundary prediction.

**GOAL** addresses this challenge through a **game-based paradigm** with three players, namely **KFP**, **AFP**, and **BP**, together with a **Dynamic Updating Technique (DUT)** that progressively refines boundary decisions through unilateral and bilateral updates for more elastic localization.

### 4. Training Data Source
The model is trained and evaluated on standard frame-supervised VMR benchmarks:
- **ActivityNet Captions**
- **Charades-STA**
- **TACoS**

---

##  Usage & Basic Inference

This codebase provides training and evaluation scripts for frame-supervised VMR, as well as checkpoints for quick reproduction.

### Step 1: Prepare the Environment
Clone the GitHub repository and install dependencies:
```bash
git clone https://github.com/iLearn-Lab/SIGIR25-GOAL.git
cd GOAL
python -m venv .venv
source .venv/bin/activate   # Linux / Mac
# .venv\Scripts\activate    # Windows
pip install numpy scipy pyyaml tqdm
```

### Step 2: Download Model Weights & Data
Prepare features and raw annotations following [ViGA](https://github.com/r-cui/ViGA)'s dataset preparation protocol.

Before running the code, please check and replace local dataset and feature paths in:
- `src/config.yaml`
- `src/utils/utils.py`


### Step 3: Run Inference

To evaluate a trained experiment folder, run:
```bash
python -m src.experiment.eval --exp path/to/your/experiment_folder
```

---

##  Limitations & Notes

**Disclaimer:** This repository is intended for **academic research purposes only**.
- The model requires access to the original benchmark datasets and extracted video features for evaluation.
- Some configuration files currently contain local path settings and should be updated before use.

---

## Citation

If you find our work useful in your research, please consider citing our paper:

```bibtex
@inproceedings{liu2025gaming,
  title={Gaming for Boundary: Elastic Localization for Frame-Supervised Video Moment Retrieval},
  author={Liu, Hao and Hu, Yupeng and Wang, Kun and Wei, Yinwei and Nie, Liqiang},
  booktitle={Proceedings of the 48th International ACM SIGIR Conference on Research and Development in Information Retrieval},
  year={2025},
  doi={10.1145/3726302.3729984}
}
```
---
## Contact
**If you have any questions, feel free to contact me at liuh90210@gmail.com**.