yeliudev's picture
Update README.md
feaea23 verified
---
license: bsd-3-clause
pipeline_tag: video-text-to-text
---
# VideoMind-2B-FT-QVHighlights
<div style="display: flex; gap: 5px;">
<a href="https://arxiv.org/abs/2503.13444" target="_blank"><img src="https://img.shields.io/badge/arXiv-2503.13444-red"></a>
<a href="https://videomind.github.io/" target="_blank"><img src="https://img.shields.io/badge/Project-Page-brightgreen"></a>
<a href="https://github.com/yeliudev/VideoMind/blob/main/README.md" target="_blank"><img src="https://img.shields.io/badge/License-BSD--3--Clause-purple"></a>
<a href="https://github.com/yeliudev/VideoMind" target="_blank"><img src="https://img.shields.io/github/stars/yeliudev/VideoMind"></a>
</div>
VideoMind is a multi-modal agent framework that enhances video reasoning by emulating *human-like* processes, such as *breaking down tasks*, *localizing and verifying moments*, and *synthesizing answers*.
## ๐Ÿ”– Model Details
### Model Description
- **Model type:** Multi-modal Large Language Model
- **Language(s):** English
- **License:** BSD-3-Clause
### More Details
Please refer to our [GitHub Repository](https://github.com/yeliudev/VideoMind) for more details about this model.
## ๐Ÿ“– Citation
Please kindly cite our paper if you find this project helpful.
```
@article{liu2025videomind,
title={VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning},
author={Liu, Ye and Lin, Kevin Qinghong and Chen, Chang Wen and Shou, Mike Zheng},
journal={arXiv preprint arXiv:2503.13444},
year={2025}
}
```