VideoMind-2B-FT-QVHighlights / README.md

yeliudev

Update README.md

feaea23 verified 10 months ago

preview code

raw

history blame contribute delete

1.52 kB

metadata

license: bsd-3-clause
pipeline_tag: video-text-to-text

VideoMind-2B-FT-QVHighlights

VideoMind is a multi-modal agent framework that enhances video reasoning by emulating human-like processes, such as breaking down tasks, localizing and verifying moments, and synthesizing answers.

🔖 Model Details

Model Description

Model type: Multi-modal Large Language Model
Language(s): English
License: BSD-3-Clause

More Details

Please refer to our GitHub Repository for more details about this model.

📖 Citation

Please kindly cite our paper if you find this project helpful.

@article{liu2025videomind,
  title={VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning},
  author={Liu, Ye and Lin, Kevin Qinghong and Chen, Chang Wen and Shou, Mike Zheng},
  journal={arXiv preprint arXiv:2503.13444},
  year={2025}
}