Instructions to use zhengmh/TaRO-8B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use zhengmh/TaRO-8B with Transformers:
# Load model directly from transformers import AutoProcessor, AutoModelForMultimodalLM processor = AutoProcessor.from_pretrained("zhengmh/TaRO-8B") model = AutoModelForMultimodalLM.from_pretrained("zhengmh/TaRO-8B") - Notebooks
- Google Colab
- Kaggle
Temporal-Aware Reasoning Optimization for Video Temporal Grounding
[🐙 GitHub Repository] • [📄 Paper PDF] • [🏠 Project Page]
✨ Introduction
This paper introduces TaRO (Temporal-Aware Reasoning Optimization), a novel framework designed to enhance the reasoning capabilities of Multi-modal Large Language Models (MLLMs) for Video Temporal Grounding (VTG). Existing reinforcement learning models often produce superficial reasoning because they rely on inefficient random exploration and reward functions that only evaluate the correctness of the final answer.
To solve this, TaRO explicitly encourages the model to "think with time" using three main components:
Constructive Reasoning Exploration: Leverages pre-generated dense captions to build high-quality reasoning paths grounded in explicit visual cues and timestamps, guiding the model's initial learning.
Temporal-Sensitivity Reward: Evaluates the quality of the model's reasoning by shuffling video frames near ground-truth boundaries; if the reasoning is genuinely anchored to specific events, the model's confidence will appropriately drop when the temporal order is disrupted.
Progressive Curriculum: Smoothly transitions the model from supervised imitation of the constructed reasoning paths to autonomous self-exploration.
Through these methods, TaRO ensures reasoning is strictly anchored to critical visual-temporal evidence, achieving state-of-the-art zero-shot performance across multiple VTG benchmarks.
🚀 Quick Start
Please refer to our official codebase for full installation and inference instructions.
- Clone the repository and install dependencies:
git clone https://github.com/oceanflowlab/TaRO
cd OmniVTG
- Download this model and launch the interactive demo:
python demo.py --model /path/to/model
For complete details on the evaluation and training, please visit our GitHub Repository.
📝 Citation
If you find our work or model helpful for your research, please consider citing our paper:
@InProceedings{Zheng_2026_ICML,
author = {Zheng, Minghang and Yin, Zihao and Yang, Yi and Peng, Yuxin and Liu, Yang},
title = {Temporal-Aware Reasoning Optimization for Video Temporal Grounding},
booktitle = {International Conference on Machine Learning},
year = {2026}
}
- Downloads last month
- -